2 research outputs found

    The Paradox of Choice: Investigating Selection Strategies for Android Malware Datasets Using a Machine-learning Approach

    Get PDF
    The increase in the number of mobile devices that use the Android operating system has attracted the attention of cybercriminals who want to disrupt or gain unauthorized access to them through malware infections. To prevent such malware, cybersecurity experts and researchers require datasets of malware samples that most available antivirus software programs cannot detect. However, researchers have infrequently discussed how to identify evolving Android malware characteristics from different sources. In this paper, we analyze a wide variety of Android malware datasets to determine more discriminative features such as permissions and intents. We then apply machine-learning techniques on collected samples of different datasets based on the acquired features’ similarity. We perform random sampling on each cluster of collected datasets to check the antivirus software’s capability to detect the sample. We also discuss some common pitfalls in selecting datasets. Our findings benefit firms by acting as an exhaustive source of information about leading Android malware datasets

    Integrated information gain with extra tree algorithm for feature permission analysis in android malware classification

    Get PDF
    The rapid growth of free applications in the android market has led to the fast spread of malware apps since users store their sensitive personal information on their mobile devices when using those apps. The permission mechanism is designed as a security layer to protect the android operating system by restricting access to local resources of the system at installation time and run time for updated versions of the android operating system. Even though permissions provide a secure layer to users, they can be exploited by attackers to threaten user privacy. Consequently, exploring the patterns of those permissions becomes necessary to find the relevant permission features that contribute to classifying android apps. However, with the era of big data and the rapid explosion of malware along with many unnecessary requested permissions, it has become a challenge to recognize the patterns of permissions from these data due to the irrelevant and redundant features that affect the classification performance and increase the complexity cost overhead. Ensemble-based Extra Tree - Feature Selection (FS-EX) algorithm was proposed in this study to explore the permission patterns by selecting a minimal-sized subset of highly discriminant permission features capable of discriminating against malware samples from nonmalware samples. The integrated Information Gain with Ensemble-based Extra Tree - Feature Selection (FS-IGEX) algorithm is proposed to assign weight values to permission features instead of binary values to determine the impact of weighted attribute variables on the classification performance. The two proposed methods based on Ensemble Extra Tree Feature Selection were evaluated on five datasets with various sample sizes and feature space using nine machine learning classifiers. Comparison studies were carried out between FS-EX subsets and the dataset of Full Permission features (FP) and the two approaches of the FS-IGEX method - the Permission-Binary (PB) approach and the Permission-Weighted (PW) approach. The permissions with PB were represented with binary values, whereas permissions with PW were represented with weighted values. The results demonstrated that the approach with the FS-EX was promising in obtaining the most prominent permission features related to the class target and attaining the same or close classification results in terms of accuracy with the highest accuracy mean of 96%, as compared to the FP. In addition, the PW approach of the FS-IGEX method had highly influential weighted permission features that could classify apps as malware and non-malware with the highest accuracy mean of 93%, compared to the PB approach of the FS-IGEX method and the FP
    corecore