4,341 research outputs found

    Automatically combining static malware detection techniques

    Get PDF
    Malware detection techniques come in many different flavors, and cover different effectiveness and efficiency trade-offs. This paper evaluates a number of machine learning techniques to combine multiple static Android malware detection techniques using automatically constructed decision trees. We identify the best methods to construct the trees. We demonstrate that those trees classify sample apps better and faster than individual techniques alone

    Analysis of Data Mining Tools for Android Malware Detection

    Get PDF
    There are various data mining tools available to analyze data related android malware detection. However, the problem arises in deciding the most appropriate machine learning techniques or algorithm on particular tools to be implemented on particular data. This research is focusing only on classification techniques. Hence, the objective of this research is to identify the best machine learning technique or algorithm on selected tool for android malware detection. Five techniques: Random Forest, Naive Bayes, Support Vector Machine, Forest, K-Nearest Neighbour and Adaboost are selected and applied in selected tools namely Weka and Orange. The result shows that Adaboost technique in Weka tool and Random Forest technique in Orange tool has obtained accuracy above 80% compare to other techniques. This result provides an option for the researcher on applying technique or algorithm on selected tool when analyzing android malware data

    Android malware detection from Google Play meta-data: Selection of important features

    Get PDF
    Android malware has emerged in the last decade as a consequence of the increasing popularity of smartphones and tablets. While most previous work focuses on inherent characteristics of Android apps to detect malware, this study analyses indirect features to identify patterns often observed in malware applications. We show that modern Machine Learning techniques applied to collected metadata from Google Play can provide a first approach towards the detection of malware applications, and we further identify which features have the highest predictive power among the total.The authors would like to acknowledge the support of the project BigDatAAM (grant no. FIS2013-47532-C3-3-P) funded by the Spanish MINECO

    Resilient and Scalable Android Malware Fingerprinting and Detection

    Get PDF
    Malicious software (Malware) proliferation reaches hundreds of thousands daily. The manual analysis of such a large volume of malware is daunting and time-consuming. The diversity of targeted systems in terms of architecture and platforms compounds the challenges of Android malware detection and malware in general. This highlights the need to design and implement new scalable and robust methods, techniques, and tools to detect Android malware. In this thesis, we develop a malware fingerprinting framework to cover accurate Android malware detection and family attribution. In this context, we emphasize the following: (i) the scalability over a large malware corpus; (ii) the resiliency to common obfuscation techniques; (iii) the portability over different platforms and architectures. In the context of bulk and offline detection on the laboratory/vendor level: First, we propose an approximate fingerprinting technique for Android packaging that captures the underlying static structure of the Android apps. We also propose a malware clustering framework on top of this fingerprinting technique to perform unsupervised malware detection and grouping by building and partitioning a similarity network of malicious apps. Second, we propose an approximate fingerprinting technique for Android malware's behavior reports generated using dynamic analyses leveraging natural language processing techniques. Based on this fingerprinting technique, we propose a portable malware detection and family threat attribution framework employing supervised machine learning techniques. Third, we design an automatic framework to produce intelligence about the underlying malicious cyber-infrastructures of Android malware. We leverage graph analysis techniques to generate relevant, actionable, and granular intelligence that can be used to identify the threat effects induced by malicious Internet activity associated to Android malicious apps. In the context of the single app and online detection on the mobile device level, we further propose the following: Fourth, we design a portable and effective Android malware detection system that is suitable for deployment on mobile and resource constrained devices, using machine learning classification on raw method call sequences. Fifth, we elaborate a framework for Android malware detection that is resilient to common code obfuscation techniques and adaptive to operating systems and malware change overtime, using natural language processing and deep learning techniques. We also evaluate the portability of the proposed techniques and methods beyond Android platform malware, as follows: Sixth, we leverage the previously elaborated techniques to build a framework for cross-platform ransomware fingerprinting relying on raw hybrid features in conjunction with advanced deep learning techniques

    Explainable machine learning for malware detection on Android applications

    Get PDF
    The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophistication and diffusion. In this paper, we explore the use of machine learning (ML) techniques to detect malware in Android apps. The focus is on the study of different data pre-processing, dimensionality reduction, and classification techniques, assessing the generalization ability of the learned models using public domain datasets and specifically developed apps. We find that the classifiers that achieve better performance for this task are support vector machines (SVM) and random forests (RF). We emphasize the use of feature selection (FS) techniques to reduce the data dimensionality and to identify the most relevant features in Android malware classification, leading to explainability on this task. Our approach can identify the most relevant features to classify an app as malware. Namely, we conclude that permissions play a prominent role in Android malware detection. The proposed approach reduces the data dimensionality while achieving high accuracy in identifying malware in Android apps.info:eu-repo/semantics/publishedVersio

    The Paradox of Choice: Investigating Selection Strategies for Android Malware Datasets Using a Machine-learning Approach

    Get PDF
    The increase in the number of mobile devices that use the Android operating system has attracted the attention of cybercriminals who want to disrupt or gain unauthorized access to them through malware infections. To prevent such malware, cybersecurity experts and researchers require datasets of malware samples that most available antivirus software programs cannot detect. However, researchers have infrequently discussed how to identify evolving Android malware characteristics from different sources. In this paper, we analyze a wide variety of Android malware datasets to determine more discriminative features such as permissions and intents. We then apply machine-learning techniques on collected samples of different datasets based on the acquired features’ similarity. We perform random sampling on each cluster of collected datasets to check the antivirus software’s capability to detect the sample. We also discuss some common pitfalls in selecting datasets. Our findings benefit firms by acting as an exhaustive source of information about leading Android malware datasets

    Transcend:Detecting Concept Drift in Malware Classification Models

    Get PDF
    Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training

    Examining Application Components to Reveal Android Malware

    Get PDF
    Smartphones are becoming ubiquitous in everyday life and malware is exploiting these devices. Therefore, a means to identify the threats of malicious applications is necessary. This paper presents a method to classify and analyze Android malware through application component analysis. The experiment parses select portions from Android packages to collect features using byte sequences and permissions of the application. Multiple machine learning algorithms classify the samples of malware based on these features. The experiment utilizes instance based learner, naive Bayes, decision trees, sequential minimal optimization, boosted naive Bayes, and boosted decision trees to identify the best components that reveal malware characteristics. The best case classifies malicious applications with an accuracy of 99.24% and an area under curve of 0.9890 utilizing boosted decision trees. This method does not require scanning the entire application and provides high true positive rates. This thesis investigates the components to provide malware classification
    • …
    corecore