Current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions.
Some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which lead to incorrect identification of some malwares.
1. Extracting AndroidManifest.xml and Smali codes by Apktool.
2. Firstly, extracting the permissions that declared in AndroidManifest.xml. Secondly, extracting API calls through scanning Smali codes in according with the mapping relation between permissions and API, and get the actually used permissions. Finally, obtaining the actually used permissions combination based on the single permission.
3. Generating feature vector, each application is represented as an instance.
4. Using five machine learning classification algorithms, including J48, Random Forest, SVM, KNN and AdaboostM1, to realize the classification and evaluation for applications.
The authors collected a total of 2375 Android applications. the 1170 malware samples are composed of 23 families from genetic engineering. 1205 benign samples are from Google officail market.
> We evaluate the classification performance of five different algorithms in terms of feature sets that have been extracted from applications, including API calls, permissions combination, the combination of actually used permissions combination and API calls, requested permissions. Inaddition, information gain and CFS feature selection algorithms are used to select the useful features to improve the efficiency of classifiers.
From the feature extraction, there is some differences between requested permissions and actually used permissions, it is imporant to improve the efficiency:
The main idea of the paper is useing actually uesd permissions instead of declared permissons. But PScout can't get the whole mapping of permissons and API calls. This can make some errors.