add some papers

This commit is contained in:
firmianay 2018-03-19 14:32:21 +08:00
parent c3bf2181e6
commit 831859c086
38 changed files with 599 additions and 2 deletions

View File

@ -1,3 +1,4 @@
2018-03-19新增章节学术篇
2018-02-211000 页成就达成
2018-01-15200 次提交成就达成
2018-01-13开始尝试使用 Latex 写作

View File

@ -71,6 +71,7 @@
- 新增第六章题解篇,收集各种好题的 Writeup应力求详细且能提供程序供实际操作一个 md 只写一题,所有文件上传到目录 `src/writeup`,题目最好来自 [CTFs](https://github.com/ctfs)。
- 新增第七章实战篇CTF 之后,总是要回到现实中,对真实存在的漏洞进行分析利用,还是一样力求详细,并提供程序复现,一个 md 写一个漏洞,所有文件上传到 `src/exploit`(程序太大的可附上网盘链接),参考 [exploit-db](https://www.exploit-db.com/)。
- 考虑到真实漏洞的环境可能会很复杂,如果能做一个基于 docker 的环境,应该会很不错,这条就作为一个未来的计划。
- 新增第八章学术篇,目前某人也处在读研还是工作的纠结中,但看看论文总不会错,一个 md 一篇文章或一类文章都可以,风格随意(参考 [How to Read an Engineering Research Paper](http://cseweb.ucsd.edu/%7Ewgg/CSE210/howtoread.html))。论文的 pdf 我会统一上传到百度网盘。
- 由于某人有强迫症,所以能用文本时绝不要截图:p但有时候动图gif也是可以考虑的。
- 看了下 GitBook 导出的 PDF排版有点不忍直视计划转战 LaTeXXeLaTeX即提供 md 和 tex 两个版本tex 版本放在目录 `tex/` 下。
- 有外国小哥哥邮件我希望提供了英文版,鉴于某人的英文水平,可能暂时不太现实,如果有人愿意承担这一部分工作,请告诉我。

View File

@ -144,6 +144,15 @@ GitHub 地址https://github.com/firmianay/CTF-All-In-One
* Return-Oriented Programming
* [8.1 The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)](doc/8.1_return-into-libc_without_function_calls.md)
* [8.2 Return-Oriented Programming without Returns](doc/8.2_return-oriented_programming_without_returns.md)
* Reverse Engineering
* [8.3 New Frontiers of Reverse Engineering](doc/8.3_new_frontiers_of_reverse_engineering.md)
* Android Security
* [8.4 EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning](doc/8.4_emulator_vs_real_phone.md)
* [8.5 DynaLog: An automated dynamic analysis framework for characterizing Android applications](doc/8.5_dynalog_an_automated_dynamic_analysis_framework.md)
* [8.6 A Static Android Malware Detection Based on Actual Used Permissions Combination and API Calls](doc/8.6_malware_detection_based_on_actual_used_permissions.md)
* [8.7 MaMaDroid: Detecting Android malware by building Markov chains of behavioral models](doc/8.7_detecting_malware_by_building_markov_chains.md)
* [8.8 DroidNative: Semantic-Based Detection of Android Native Code Malware](doc/8.8_droidnative_semantic-based_detection_of_android_native_code_malware.md)
* [8.9 DroidAnalytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware](doc/8.9_droidanalytics_signature_based_analytic_system.md)
* [九、附录](doc/9_appendix.md)
* [9.1 更多 Linux 工具](doc/9.1_Linuxtools.md)
* [9.2 更多 Windows 工具](doc/9.2_wintools.md)

View File

@ -0,0 +1,101 @@
# 8.3 New Frontiers of Reverse Engineering
## What is your take-away message from this paper?
This paper briefly presents an overview of the field of reverse engineering, reviews main achievements and areas of application, and highlights key open research issues for the future.
## What are motivations for this work?
#### What is reverse engineering?
The term *reverse engineering* was defined as:
> the process of analyzing a subject system to
>
> (i) identify the system's components and their inter-relationships and
>
> (ii) create representations of the system in another form or at a higher level of abstraction.
So, the core of reverse engineering consists two parts:
1. deriving information from the available software artifacts
2. translating the information into abstract representations more easily understandable by humans
#### Why we need reverse engineering?
Reverse engineering is a key supporting technology to deal with systems that have the source code as the only reliable representation.
#### Previous reverse engineering
Reverse engineering has been traditionally viewed as a two step process: information extraction and abstraction.
![](../pic/8.3_tools_arch.png)
![](../pic/8.3_arch_reengineering.png)
The discussion of the main achievements of reverse engineering in last 10 years is organized three main threads:
- program analysis and its applications
- design recovery
- software visualization
#### Program analysis and its applications
Several analysis and transformation toolkits provide facilities for parsing the source code and performing rule-based transformations.
- alternative source code analysis approaches
- extract fact even without the need for a thorough source code parsing, relevant information from the source code
- incorporating reverse engineering techniques into development environments or extensible editors
- deal with peculiarities introduced by object-oriented languages
- deal with the presence of clones in software systems
#### Architecture and design recovery
- the diffusion of object-oriented languages and UML introduced the need of reverse engineering UML models from source code
- identifying design patterns into the source code aims at promoting reuse and assessing code quality
- techniques using static analysis, dynamic analysis, and their combination, were proposed
- the need for reverse engineering techniques tied to Web Applications
#### Visualization
Software visualization is a crucial step for reverse engineering.
- straightforward visualization: UML diagrams, state machines, CFGs
- highlight relevant information at the right level of detail
## Future trends of reverse engineering
#### program analysis
- high dynamicity
- many programming languages widely used today allow for high dynamicity which make analysis more difficult
- e.g. reflection in Java that can load classes at run-time
- cross-language applications
- more cross-language applications today
- e.g. Web Applications: HTML, SQL, scripts
- mining software repositories
- a new, important research area
So, Reverse engineering research has highlighted the dualism between static and dynamic analysis and the need to complement the two techniques, trying to exploit the advantages of both and limit their disadvantages. And recent years the third dimension named historical analysis added.
- static analysis
- when it is performed, within a single system snapshot, on software artifacts without requiring their execution
- must deal with different language variants and non-compilable code
- fast, precise, and cheap
- many peculiarities of programming languages, such as pointers and polymorphism, or dynamic classes loading, make static analysis difficult and sometimes imprecise
- dynamic analysis
- when it is performed by analyzing execution traces obtained from the execution of instrumented versions of a program, or by using an execution environment able to capture facts from program execution
- extracts information from execution traces
- since it depends on program inputs, it can be incomplete
- challenge: ability to mine relevant information from execution traces (execution traces tend to quickly become large and unmanageable, thus a relevant challenge is to filter them and extract information relevant for the particular understanding task being performed)
- historical analysis
- when the aim is to gain information about the evolution of the system under analysis by considering the changes performed by developers to software artifacts, as recorded by versioning systems
#### design recovery
- design paradigms
- a lot work needs to be done in particular for what regards the extraction of dynamic diagrams and also of OCL pre and post- conditions
- new software architectures that have characteristics of being extremely dynamic, highly distributed, self-configurable and heterogeneous
- e.g. Web 2.0 applications
- incomplete, imprecise and semi-automatic
- the reverse engineering machinery should be able to learn from expert feedbacks to automatically produce results
- e.g. machine learning, meta-heuristics and artificial intelligence
#### visualization
Effective visualizations should be able to :
1. show the right level of detail a particular user needs, and let the user choose to view an artifact at a deeper level or detail, or to have a coarse-grain, in-the-large, view
2. show the information in a form the user is able to understand. Simpler visualizations should be favored over more complex ones, like 3D or animations, when this does not necessarily bring additional information that cannot be visualized in a simpler way
## Reverse engineering in emerging software development scenarios
The challenges for reverse engineering:
1. on the one hand, the analysis of systems having high dynamism, distribution and heterogeneity and, on the other hand, support their development by providing techniques to help developers enable mechanisms such as automatic discovery and reconfiguration
2. the need for a full integration of reverse engineering with the development process, which will benefit from on-the-fly application of reverse engineering techniques while a developer is writing the code, working on a design model, etc.
## Final
![](../pic/8.3_role.png)

View File

@ -0,0 +1,92 @@
# 8.4 EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning
## What is your take-away message from this paper?
The authors present an investigation of machine learning based malware detection using dynamic analysis on real devices.
## What are motivations for this work?
#### malware
The rapid increase in malware numbers targeting Android devices has highlighted the need for efficient detection mechanisms to detect zero-day malware.
#### anti-emulator techniques
Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator.
## What is the proposed solution?
>Hence, we have designed and imple- mented a python-based tool to enable dynamic analysis using real phones to automatically extract dynamic features and potentially mitigate anti-emulation detection. Further- more, in order to validate this approach, we undertake a comparative analysis of emulator vs device based detection by means of several machine learning algorithms. We examine the performance of these algorithms in both environments after investigating the effectiveness of obtaining the run-time features within both environments.
#### phone based dynamic analysis and feature extraction
Since our aim is to perform experiments to compare emulator based detection with device based detection we need to extract features for the supervised learning fromboth environments. For the emulator based learning, we utilized the [DynaLog](https://arxiv.org/pdf/1607.08166.pdf) dynamic analysis framework.
- emulator based: DynaLog provides the ability to instrument each application with the necessary API calls to be monitored, logged and extracted from the emulator during the run-time analysis.
- device based: extended with a python-based tool
- push a list of contacts to the device SD card and then import them to populate the phones contact list.
- Discover and uninstall all third-party applications prior to installing the app under analysis.
- Check whether the phone is in airplane mode or not.
- Check the battery level of the phone.
- Outgoing call dialling using adb shell.
- Outgoing sms messages using adb shell.
- Populate the phone SD card with other assets.
![](../pic/8.4_model.png)
#### Features extraction
After using DynaLog, the outputs are pre-procesed into a file of feature vectors representing the features extracted from each application. Then use InfoGain feature ranking algorithm in WEKA to get the top 100 ranked features.
#### Machine learning classifiers
The features were divided into file different sets to compare the performance using machine learning algorithms.
## What is the work's evaluation of the proposed solution?
#### Dataset
>The dataset used for the experiments consists of a total of 2444 Android applications. Of these, 1222 were malware samples obtained from 49 families of the Android malware genome project. The rest were 1222 benign samples obtained from Intel Security (McAfee Labs).
#### Machine learning algorithms
The following algorithms were used in the experiments:
- Support Vector Machine (SVM-linear)
- Naive Bayes (NB)
- Simple Logistic (SL)
- Multilayer Perceptron (MLP)
- Partial Decision Trees (PART)
- Random Forest (RF)
- J48 Decision Tree.
#### Metrics
Five metrics were used for the performance emulation of the detection approaches.
- true positive rate (TPR)
- true negative ratio (TNR)
- false positive ratio (FPR)
- false negative ratio (FNR)
- weighted average F-measure.
#### Experiment 1: Emulator vs Device analysis and feature extraction
![](../pic/8.4_percentage.png)
#### Experiment 2: Emulator vs Device Machine learning detection comparison
![](../pic/8.4_emulator.png)
![](../pic/8.4_phone.png)
>Our experiments showed that several features were extractedmore effectively fromthe phone than the emulator using the same dataset. Furthermore, 23.8% more apps were fully analyzed on the phone compared to emulator.
This shows that for more efficient analysis the phone is definitely a better environment as far more apps crash when being analysed on the emulator.
>The results of our phone-based analysis obtained up to 0.926 F-measure and 93.1%TPR and 92%FPR with the RandomForest classifier and in general, phone-based results were better than emulator based results.
Thus we conclude that as an in- centive to reduce the impact of malware anti-emulation and environmental shortcomings of emulators which affect analysis efficiency, it is important to develop more effective ma- chine learning device based detection solutions.
## What is your analysis of the identified problem, idea and evaluation?
Countermeasures against anti-emulator are becoming increasingly important in Android malware detection.
## What are the contributions?
- Presented an investigation of machine learning based malware detection using dynamic analysis on real Android devices.
- Implemented a tool to automatically extract dynamic features from Android phones.
- Through several experiments we performed a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms.
## What are future directions for this research?
>Hence future work will aim to investigate more effective, larger scale device based machine learning solutions using larger sample datasets. Future work could also investigate alternative set of dynamic features to those utilized in this study.
## What questions are you left with?
- How to make emulator environment more closer to real environment?
- How to make more powerful dynamic analysis tools that can against anti-emulation techniques?
- Why the difference in Android versions had no impact?

View File

@ -0,0 +1,73 @@
# 8.5 DynaLog: An automated dynamic analysis framework for characterizing Android applications
## What is your take-away message from this paper?
The authors presented DynaLog, a framework that enable automated mass dynamic analysis of applications in order to characterize them for analysis and potential detection of malicious behaviour.
## What are motivations for this work?
#### Malware
- more then 5 million malware samples
- signature-based AVs take up to 48day to detect new malware
- sophisticated detection avoidance techniques such as obfuscation, and payload encryption making it more difficult
#### Current Methods' Limitations
- Static: detection avoidance by sophisticated obfuscation techniques, run-time loading of malicious payload.
- Dynamic: are either closed source or can only be accessed by submitting apps online for analysis, which can also limit automated mass analysis of apps by analysts.
## What is the proposed solution?
DynaLog has several components:
1. Emulator-based analysis sandbox
2. APK instrumentation module
3. behaviour/features logging and extraction
4. Application trigger/exerciser
5. Log parsing and processing scripts
![](../pic/8.5_architecture.png)
#### Dynamic analysis tool (DroidBox capabilities)
- An open source tool used to extract some high level behaviour and characteristics by running the app on an Android device emulator or (AVD).
- Extracts these behaviours from the logs dumped by logcat.
- Uses Androguard to extract static meta-data relating to the app.
- Utilizes Taintdroid for data leakage detection.
- Used as a building block for several dynamic analysis tools.
#### Problems with Sandbox performance
- Lack of complete code coverage.
- Lack of complete traffic communication, server not found.
- Real events need to trigger some malicious behaviour.
#### Extended Sandbox to overcome these issues by:
- Improving AVD emulator to behave like realistic devece
- New scripts to improve code coverage
## What is the work's evaluation of the proposed solution?
#### Dataset
>We used 1226 real malware samples from 49 families of the Malgenome Project malware dataset. Furthermore, a set of 1000 internally vetted benign APKs from McAfee Labs were utilized.
#### Experiment 1: evaluating high level behaviour features
![](../pic/8.5_experiment.png)
#### Experiment 2: evaluating extended features and sandbox enhancements within DynaLog
![](../pic/8.5_experiment2.png)
![](../pic/8.5_experiment3.png)
#### Results
![](../pic/8.5_result.png)
## What is your analysis of the identified problem, idea and evaluation?
- DynaLog suffers from the same limitations of other dynamic analysis tools.
- Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools.
- DynaLog does not log output from native code.
## What are the contributions?
- We present DynaLog, a dynamic analysis framework to enable automated analysis of Android applications.
- We present extensive experimental evaluation of DynaLog using real malware samples and clean applications in order to validate the framework and measure its capability to enable identification of malicious behaviour through the extracted behavioural features.
## What are future directions for this research?
For future work we intend to develop and couple classification engines that can utilize the extensive features of DynaLog for accurate identification of malware samples. Furthermore, we intend to enhance the framework to improve its robustness against anti-analysis techniques employed by some malware whilst also incorporating new feature sets to improve the overall analysis and detection capabilities.
## What questions are you left with?

View File

@ -0,0 +1,60 @@
# 8.6 A Static Android Malware Detection Based on Actual Used Permissions Combination and API Calls
## What is your take-away message from this paper?
The paper put forward a machine learning detection method that based on the actually used Permissions Combination and API calls.
## What are motivations for this work?
#### Android development
Current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions.
#### Current methods
Some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which lead to incorrect identification of some malwares.
## What is the proposed solution?
> We present a machine learning detection method which is based on the actually used permission combinations and API calls.
![](../pic/8.6_framework.png)
The framework contains mainly four parts:
1. Extracting AndroidManifest.xml and Smali codes by Apktool.
2. Firstly, extracting the permissions that declared in AndroidManifest.xml. Secondly, extracting API calls through scanning Smali codes in according with the mapping relation between permissions and API, and get the actually used permissions. Finally, obtaining the actually used permissions combination based on the single permission.
3. Generating feature vector, each application is represented as an instance.
4. Using five machine learning classification algorithms, including J48, Random Forest, SVM, KNN and AdaboostM1, to realize the classification and evaluation for applications.
## What is the work's evaluation of the proposed solution?
#### Data Set
The authors collected a total of 2375 Android applications. the 1170 malware samples are composed of 23 families from genetic engineering. 1205 benign samples are from Google officail market.
#### Results
>We evaluate the classification performance of five different algorithms in terms of feature sets that have been extracted from applications, including API calls, permissions combination, the combination of actually used permissions combination and API calls, requested permissions. Inaddition, information gain and CFS feature selection algorithms are used to select the useful features to improve the efficiency of classifiers.
From the feature extraction, there is some differences between requested permissions and actually used permissions, it is imporant to improve the efficiency:
![](../pic/8.6_different.png)
The experiments show that the feature of actually used permissions combination an API calls can achieve better performance:
![](../pic/8.6_result.png)
## What is your analysis of the identified problem, idea and evaluation?
The main idea of the paper is useing actually uesd permissions instead of declared permissons. But PScout can't get the whole mapping of permissons and API calls. This can make some errors.
## What are the contributions?
1. Presented an Android malware detection method.
2. Various machine learning algorithms, feature selection methods and experimental samples are used to validate the efficiency.
3. The method can improve the performance of classifiers significantly and is more accurate than before methods.
## What are future directions for this research?
- More useful characteristics could be extracted to achieve better results.
- Integration of multiple classifiers could be used to improve the identification of classifiers.
## What questions are you left with?
Why not evaluate the performance of classifiers obtained when using the combination of declared permissions combination and API calls?

View File

@ -0,0 +1,94 @@
# 8.7 MaMaDroid: Detecting Android malware by building Markov chains of behavioral models
## What is your take-away message from this paper?
This paper presented an Android malware detection system based on modeling the sequences of API calls as Markov chains.
## What are motivations for this work?
#### Android & Malware
Now making up 85% of mobile devices, Android smartphones have become profitable targets for cybercriminals, allowing them to bypass two factor authentication or steal sensitive information.
#### Current Defenses
- Smartphones have limited battery, making it infeasible to use traditional approaches.
- Google Play Store is not able to detect all malicious apps.
- Previous malware detection studies focused on models based on permissions or on specific API calls. The first is prone to false positives and the latter needs constant retraining.
#### The Idea
While malicious and begign apps may call the same API calls during their execution, but being called in a different order.
## What is the proposed solution?
>We present a novel malware detection system for Android that instead relies on the *sequence* of *abstracted* API calls performed by an app rather than their use or frequency, aiming to capture the behavioral model of the app.
MaMaDroid is build by combining four different phases:
- Call graph extraction: starting from the apk file of an app, we extract the call graph of the analysed sample.
- Sequence extraction: from the call graph, we extract the different potential paths as sequences of API calls and abstract all those calls to higher levels.
- Markov Chain modelling: all the samples got their sequences of abstracted calls, and these sequences can be modelled as transitions among states of a Markov Chain.
- Classification: Given the probabilities of transition between states of the chains as features set, we apply machine learning to detect malicious apps.
![](../pic/8.7_overview.png)
#### Call Graph Extraction
Static analysis apk using the [Soot](https://sable.github.io/soot/) framework to extract call graphs and [FlowDroid](https://blogs.uni-paderborn.de/sse/tools/flowdroid/) to ensure contexts and flows are preserved.
#### Sequence Extraction
Taking the call graph as input, it extract the sequences of functions potentially called by the program and, by identifies a set of entry nodes, enumerates the paths and output them as sequences of API calls.
>The set of all paths identified during this phase constitute the sequences of API calls which will be used to build a Markov chain behavioural model and to extract features.
The system operate in one of two modes by abstracting each call to either its package or family.
- in package mode: abstract an API call to its package name using the list of Android packages (includes 243 packages, 95 from the Google API, plus self-defined and obfuscated packages).
- in family mode: abstract to nine possible families (android, google, java, javax, xml, apache, junit, json, dom) or developer-defined (self-defined) and obfuscated (obfuscated) packages.
>This allows the system to be resilient to API changes and achieve scalability. In fact, our experiments, presented in section III, show that, from a dataset of 44K apps, we extract more than 10 million unique API calls, which would result in a very large number of nodes, with the corresponding graphs (and feature vectors) being quite sparse.
#### Markov Chain Modeling
Now it builds a Markov chain where each package/family is a state and the transitions represent the probability of moving from one state to another.
![](../pic/8.7_sequence.png)
![](../pic/8.7_markov.png)
>Next, we use the probabilities of transi-tioning from one state (abstracted call) to another in the Markov chain as the feature vector of each app. States that are not present in a chain are represented as 0 in the feature vector. Also note that the vector derived from the Markov chain depends on the operational mode of MAMADROID. With families, there are 11 possible states, thus 121 possible transitions in each chain, while, when abstracting to packages, there are 340 states and 115,600 possible transitions.
The authors also experiment with applying PCA (Principle Component Analysis) to reduce the feature space.
#### Classification
The phase uses Machine Learning algorithms: Random Forests, 1-NN, 3-NN and SVM. The last one was discarded as it was slower and less accurate in classification than the other ones.
## What is the work's evaluation of the proposed solution?
The authors gathered a collection of 43,490 Android apps, 8,447 benign and 35,493 malware apps. This included a mix of apps from October 2010 to May 2016, enabling the robustness of classification over time to be explored.
The authors used the F-Measure to evaluate our system through 3 different kinds of tests: testing on samples from the same databases of the training set, testing on newer samples than the ones used for the training set, and testing on older samples than the ones used for the training set.
![](../pic/8.7_fmeasure.png)
>As Android evolves over the years, so do the characteristics of both benign and malicious apps. Such evolution must be taken into account when evaluating Android malware detection systems, since their accuracy might significantly be affected as newer APIs are released and/or as malicious developers modify their strategies in order to avoid detection. Evaluating this aspect constitutes one of our research questions, and one of the reasons why our datasets span across multiple years (20102016).
Testing on samples newer than the training ones (figure below, on the left) helps understanding if the system is resilient to changes in time, or if it needs constant retraining.
![](../pic/8.7_fmeasure2.png)
It also set to verify whether older malware samples can still be detected, with similar F-measure scores across the years ranging from 95-97% in package mode.
## What is your analysis of the identified problem, idea and evaluation?
As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training.
The system abstractes to families or packages makes it less susceptible to the introduction of new API calls. It's a great idea and be proved to have good performance.
But the system might be evaded through repackaging benign apps, or make a new app by imitating the Markov chains of benign apps.
## What are the contributions?
>First, we introduce a novel approach, implemented in a tool called MAMADROID, to detect Android malware by abstracting API calls to their package and family, and using Markov chains to model the behavior of the apps through the sequences of API calls. Second, we can detect unknown samples on the same year of training with an F-measure of 99%, but also years after training the system, meaning that MAMADROID does not need continuous re-training. Our system is scalable as we model every single app independently from the others and can easily append app features in a new training set. Finally, compared to previous work [2], MAMADROID achieves significantly higher accuracy with reasonably fast running times, while also being more robust to evolution in malware development and changes in the Android API.
## What are future directions for this research?
In the future the authors plan to work on exploring and testing in deep MaMaDroids resilience to the main evasion techniques, to try more fine-grained abstractions and seed with dynamic analysis.
## What questions are you left with?
What are the stable things with Android system updating, whether they can be used for malware detection and, how to keep accuray and stability for a long time?

View File

@ -0,0 +1,87 @@
# 8.8 DroidNative: Semantic-Based Detection of Android Native Code Malware
## What is your take-away message from this paper?
The paper proposed DroidNative for detection of both bytecode and native code Android malware variants.
## What are motivations for this work?
#### native code
A recent study shows that 86% of the most popular Android applications contain native code.
#### current methods
the plethora of more sophisticated detectors making use of static analysis techniques to detect such variants operate only at the bytecode level, meaning that malware embedded in native code goes undetected.
- No coverage of Android native binary code.
- Do not handle obfuscations at function level. Low level semantics are not covered.
- Heuristics used are very specific to malware programs, and hence are not scalable.
- Slow runtimes, can not be used in a practical system.
## What is the proposed solution?
>This paper introduces DroidNative, a malware detection system for Android that operates at the native code level and is able to detect malware in either bytecode or native code. DroidNative performs static analysis of the native code and focuses on patterns in the control flow that are not significantly impacted by obfuscations. DroidNative is not limited to only analyzing native code, it is also able to analyze bytecode by making use of the Android runtime (ART) to compile bytecode into native code suitable for analysis. The use of control flow with patterns enables DroidNative to detect smaller size malware, which allows DroidNative to reduce the size of a signature for optimizing the detection time without reducing the DR.
#### MAIL
DroidNative uses MAIL (Malware Analysis Intermediate Language) to provide an abstract representation of an assembly program, and that representation is used for malware analysis and detection.
![](../pic/8.8_overview.png)
#### Disassembler
- A challenge is ensuring that all code is found and disassembled.
- To overcome the dificiencies of linear sweep and recursive traversal we combine these two techniques while disassembling.
- Another challenge is that most binaries used in Android are stripped, meaning they do not include debugging or symbolic information.
- We handle this problem by building control flow patterns and use them for malware detection.
#### Optimizer
Removing other instructions that are not required for malware analysis. DroidNative builds multiple, smaller, interwoven CFGs for a program instead of a single, large CFG.
#### MAIL Generation
The MAIL Generator translates an assembly program to a MAIL program.
#### Malware Detection
- Data Miner: searches for the control and structural information in a MAIL program
- Signature Generator: builds a behavioral signature (ACFG or SWOD) of the MAIL program.
- Similarity Detector: matches the signature of the program against the signatures of the malware templates extracted during the training phase, and determines whether the application is malware based on thresholds that are computed empirically.
#### ACFG
A CFG is built for each function in the an- notated MAIL program, yielding the ACFGs.
![](../pic/8.8_acfg.png)
#### SWOD
Each MAIL pattern is assigned a weight based on the SWOD that represents the differences between malware and benign samples MAIL patterns distributions.
![](../pic/8.8_swod.png)
## What is the work's evaluation of the proposed solution?
#### Dataset
>Our dataset for the experiments consists of total 2240 Android applications. Of these, 1240 are Android malware programs collected from two different resources and the other 1000 are benign programs containing Android 5.0 system programs, libraries and standard applications.
#### N-Fold Cross Validation
The authors use n-flod cross validation to estimate the performance and define the following evaluation metrics: DR, FPR, ROC, AUC.
![](../pic/8.8_roc_graph.png)
## What is your analysis of the identified problem, idea and evaluation?
This is the first research effort to detect malware deal with the native code. It shows sperior results for the detection of Android native code and malware variants compared to the other research efforts and the commercial tools.
But there are some limitations:
- requires that the application's malicious code be available for static analysis.
- excels at detecting variants of malware that has been previously seen, and may not be able to detect true zero-day malware.
- may not be able to detect a malware employing excessive flow obfuscations.
- the pattern matching may fail if the malware variant obfuscates a statement in a basic block.
## What are the contributions?
- DroidNative is the first system that builds and designs cross-platform signatures for Android and operates at the native code level, allowing it to detect malware embedded in either bytecode or native code.
- DroidNative is faster than existing systems, making it suitable for real-time analysis.
## What are future directions for this research?
>To improve DroidNatives resilient to such obfuscations, in the future we will use a threshold for pattern matching. We will also investigate other pattern matching techniques, such as a statement dependency graph or assigning one pattern to multiple statements of different type etc, to improve this resiliency.
## What questions are you left with?
There are many other programming languages (JavaScript/Python/...) can be used for Android app development. How to detect malware written in those languages?

View File

@ -0,0 +1,70 @@
# 8.9 DroidAnalytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware
## What is your take-away message from this paper?
The authors present DroidAnalytics, an Android malware analytic system for malware collection, signature generation, information retrieval, and malware association based on similarity score. Furthermore, DroidAnalytics can efficiently detect zero-day repackaged malware.
## What are motivations for this work?
An effective analytic system needs to address the following questions:
- How to automatically collect and manage a high volume of mobile malware?
- How to analyze a zero-day suspicious application, and compare or associate it with existingmalware families in the database?
- How to perform information retrieval so to reveal similar malicious logic with existing malware, and to quickly identify the new malicious code segment?
## What is the proposed solution?
![](../pic/8.9_architecture.png)
The system consists these modules:
- Extensible Crawler: systematically build up the mobile applications database for malware analysis and association.
- Dynamic Payload Detector: to deal with malware which dynamically downloads malicious codes via Internet or attachment files.
- scans the package, identifies files using their magic numbers instead of file extension.
- use the forward symbolic execution technique to trigger the download behavior.
- Android App Information (AIS) Parser: it is used to represent *.apk* information.
- Signature Generator: use a three-level signature generation scheme to identify each application, which is based on the mobile application, classes, methods. We generate a method's signature using the API call sequence, and given the signature of a method, create the signature of a class which composes of different methods, finally, the signature of an application is composed of all signatures of its classes.
- Android API calls table: use the Java reflection to obtain all descriptions of the API calls.
- Disassembling process: takes the Dalvik opcodes of the *.dex* file and transforms them to methods and classes.
- Generate Lev3 signature: extracts the API call ID sequence as a string in each method, then hashes this string value to produce the method's signature.
- Generate Lev2 signature: generate the Lev2 signature for each class based on the Lev3 signature of methods within that class.
- Generate Lev1 signature: based on the Lev2 signatures.
![](../pic/8.9_signature.png)
![](../pic/8.9_signature2.png)
## What is the work's evaluation of the proposed solution?
>We conduct three experiments and show how analysts can study malware, carry out similarity measurement between applications, as well as perform class association among 150,368 mobile applications in the database.
- analyzing malware repackaging
- analyzing malware which uses code obfuscation
- analyzing malware with attachement files or dynamic payloads
>we have used DroidAnalyt- ics to detect 2,494 malware samples from 102 families, with 342 zero-day malware samples from six different families.
## What is your analysis of the identified problem, idea and evaluation?
DroidAnalytics's signature generation is based on the following observation: For any functional application, it needs to invoke various Android API calls, and Android API calls sequence within a methods is difficult to modify.
Traditional Hash vs Three-level Signature:
- Traditional hash
- Hackers can easily mutate a malware
- Not flexible for analysis
- Three-level signature
- App, classes and methods
- Defend against obfuscation
- Facilitate analysis
- Zero-day malware
## What are the contributions?
The authors present the design and implementation of DroidAnalytics:
- DroidAnalytics automates the processes of malware collection, analysis and management.
- DroidAnalytics uses a multi-level signature algorithm to extract the malware feature based on their semantic meaning at the opcode level.
- DroidAnalytics associates malware and generates signatures at the app/class/method level.
- Show how to use DroidAnalytics to detect zero-day repackaged malware.
## What are future directions for this research?
## What questions are you left with?

View File

@ -4,5 +4,14 @@
链接https://pan.baidu.com/s/1G-WFCzAU2VdrrsHqJzjGpw 密码vhfw
* Return-Oriented Programming
* [8.1 The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)](8.1_return-into-libc_without_function_calls.md)
* [8.2 Return-Oriented Programming without Returns](8.2_return-oriented_programming_without_returns.md)
* [8.1 The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)](doc/8.1_return-into-libc_without_function_calls.md)
* [8.2 Return-Oriented Programming without Returns](doc/8.2_return-oriented_programming_without_returns.md)
* Reverse Engineering
* [8.3 New Frontiers of Reverse Engineering](doc/8.3_new_frontiers_of_reverse_engineering.md)
* Android Security
* [8.4 EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning](doc/8.4_emulator_vs_real_phone.md)
* [8.5 DynaLog: An automated dynamic analysis framework for characterizing Android applications](doc/8.5_dynalog_an_automated_dynamic_analysis_framework.md)
* [8.6 A Static Android Malware Detection Based on Actual Used Permissions Combination and API Calls](doc/8.6_malware_detection_based_on_actual_used_permissions.md)
* [8.7 MaMaDroid: Detecting Android malware by building Markov chains of behavioral models](doc/8.7_detecting_malware_by_building_markov_chains.md)
* [8.8 DroidNative: Semantic-Based Detection of Android Native Code Malware](doc/8.8_droidnative_semantic-based_detection_of_android_native_code_malware.md)
* [8.9 DroidAnalytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware](doc/8.9_droidanalytics_signature_based_analytic_system.md)

BIN
pic/8.3_arch_reengineering.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

BIN
pic/8.3_role.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

BIN
pic/8.3_tools_arch.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

BIN
pic/8.4_emulator.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 61 KiB

BIN
pic/8.4_model.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

BIN
pic/8.4_percentage.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
pic/8.4_phone.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

BIN
pic/8.5_architecture.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

BIN
pic/8.5_experiment.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

BIN
pic/8.5_experiment2.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 77 KiB

BIN
pic/8.5_experiment3.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
pic/8.5_result.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

BIN
pic/8.6_different.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

BIN
pic/8.6_framework.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 176 KiB

BIN
pic/8.6_result.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

BIN
pic/8.7_fmeasure.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

BIN
pic/8.7_fmeasure2.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

BIN
pic/8.7_markov.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

BIN
pic/8.7_overview.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

BIN
pic/8.7_sequence.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

BIN
pic/8.8_acfg.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

BIN
pic/8.8_overview.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
pic/8.8_roc_graph.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
pic/8.8_swod.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

BIN
pic/8.9_architecture.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

BIN
pic/8.9_signature.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

BIN
pic/8.9_signature2.png Executable file

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB