145 research outputs found

    Malware Detection using Machine Learning and Deep Learning

    Full text link
    Research shows that over the last decade, malware has been growing exponentially, causing substantial financial losses to various organizations. Different anti-malware companies have been proposing solutions to defend attacks from these malware. The velocity, volume, and the complexity of malware are posing new challenges to the anti-malware community. Current state-of-the-art research shows that recently, researchers and anti-virus organizations started applying machine learning and deep learning methods for malware analysis and detection. We have used opcode frequency as a feature vector and applied unsupervised learning in addition to supervised learning for malware classification. The focus of this tutorial is to present our work on detecting malware with 1) various machine learning algorithms and 2) deep learning models. Our results show that the Random Forest outperforms Deep Neural Network with opcode frequency as a feature. Also in feature reduction, Deep Auto-Encoders are overkill for the dataset, and elementary function like Variance Threshold perform better than others. In addition to the proposed methodologies, we will also discuss the additional issues and the unique challenges in the domain, open research problems, limitations, and future directions.Comment: 11 Pages and 3 Figure

    Android Malware Detection using Machine Learning Techniques

    Get PDF
    Android is the world\u27s most popular and widely used operating system for mobile smartphones today. One of the reasons for this popularity is the free third-party applications that are downloaded and installed and provide various types of benefits to the user. Unfortunately, this flexibility of installing any application created by third parties has also led to an endless stream of constantly evolving malware applications that are intended to cause harm to the user in many ways. In this project, different approaches for tackling the problem of Android malware detection are presented and demonstrated. The data analytics of a real-time detection system is developed. The detection system can be used to scan through installed applications to identify potentially harmful ones so that they can be uninstalled. This is achieved through machine learning models. The effectiveness of the models using two different types of features, namely permissions and signatures, is explored. Exploratory data analysis and feature engineering are first implemented on each dataset to reduce a large number of features available. Then, different data mining supervised classification models are used to classify whether a given app is malware or benign. The performance metrics of different models are then compared to identify the technique that offers the best results for this purpose of malware detection. It is observed in the end that the signatures-based approach is more effective than the permissions-based approach. The kNN classifier and Random Forest classifier are both equally effective in terms of the classification models

    Malicious Malware Detection Using Machine Learning Perspectives

    Get PDF
    The opportunity for potential attackers to use more advanced techniques to exploit more people who are online is growing. These methods include getting visitors to click on dangerous URLs that could expose them to spam and ads, financial fraud, defacement of their website, and malware.  In this study, we tested different machine learning algorithms against a set of harmful URLs to see how well they worked overall and how well they found malware, spam, defacement, or phishing. The ISXC-URL-2016 dataset from the University of New Brunswick was used to make the dataset. The data was evaluated in Weka using the Random Forest, Decision Tree, Naïve Bayes, and Support Vector Machine algorithms. Each evaluation had a split of 80% of the data and a 5-fold, 10-fold, or 15-fold cross-validation. It was found that the 10-fold Random Forest algorithm correctly categorized 98.8% of the dataset's cases with the most accuracy.  The results of this experiment showed that machine learning can be a useful tool for companies that want to improve their security. Despite different limitations encountered in the completion of this research, This study is the most comprehensive available on the use of practices relevant to Malware detection. Keywords:Machine Learning, URLs, Random Forest, Naive Bayes, Decision Tree, Support Vector Machine DOI: 10.7176/JIEA/12-2-02 Publication date: November 30th 202

    JavaScript Metamorphic Malware Detection Using Machine Learning Techniques

    Get PDF
    Various factors like defects in the operating system, email attachments from unknown sources, downloading and installing a software from non-trusted sites make computers vulnerable to malware attacks. Current antivirus techniques lack the ability to detect metamorphic viruses, which vary the internal structure of the original malware code across various versions, but still have the exact same behavior throughout. Antivirus software typically relies on signature detection for identifying a virus, but code morphing evades signature detection quite effectively. JavaScript is used to generate metamorphic malware by changing the code’s Abstract Syntax Tree without changing the actual functionality, making it very difficult to detect by antivirus software. As JavaScript is prevalent almost everywhere, it becomes an ideal candidate language for spreading malware. This research aims to detect metamorphic malware using various machine learning models like K Nearest Neighbors, Random Forest, Support Vector Machine, and Naïve Bayes. It also aims to test the effectiveness of various morphing techniques that can be used to reduce the accuracy of the classification model. Thus, this involves improvement on both fronts of generation and detection of the malware helping antivirus software detect morphed codes with better accuracy. In this research, JavaScript based metamorphic engine reduces the accuracy of a trained malware detector. While N-gram frequency based feature vectors give good accuracy results for classifying metamorphic malware, HMM feature vectors provide the best results

    EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The Android operating system has become the most popular operating system for smartphones and tablets leading to a rapid rise in malware. Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. For this reason, countermeasures against anti-emulation are becoming increasingly important in Android malware detection. Analysis and detection based on real devices can alleviate the problems of anti-emulation as well as improve the effectiveness of dynamic analysis. Hence, in this paper we present an investigation of machine learning based malware detection using dynamic analysis on real devices. A tool is implemented to automatically extract dynamic features from Android phones and through several experiments, a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms is undertaken. Our study shows that several features could be extracted more effectively from the on-device dynamic analysis compared to emulators. It was also found that approximately 24% more apps were successfully analysed on the phone. Furthermore, all of the studied machine learning based detection performed better when applied to features extracted from the on-device dynamic analysis

    Android Mobile Malware Detection Using Machine Learning: A Systematic Review

    Get PDF
    With the increasing use of mobile devices, malware attacks are rising, especially on Android phones, which account for 72.2% of the total market share. Hackers try to attack smartphones with various methods such as credential theft, surveillance, and malicious advertising. Among numerous countermeasures, machine learning (ML)-based methods have proven to be an effective means of detecting these attacks, as they are able to derive a classifier from a set of training examples, thus eliminating the need for an explicit definition of the signatures when developing malware detectors. This paper provides a systematic review of ML-based Android malware detection techniques. It critically evaluates 106 carefully selected articles and highlights their strengths and weaknesses as well as potential improvements. Finally, the ML-based methods for detecting source code vulnerabilities are discussed, because it might be more difficult to add security after the app is deployed. Therefore, this paper aims to enable researchers to acquire in-depth knowledge in the field and to identify potential future research and development directions

    Android malware detection using machine learning to mitigate adversarial evasion attacks

    Get PDF
    In the current digital era, smartphones have become indispensable. Over the past few years, the exponential growth of Android users has made this operating system (OS) a prime target for smartphone malware. Consequently, the arms race between Android security personnel and malware developers seems enduring. Considering Machine Learning (ML) as the core component, various techniques are proposed in the literature to counter Android malware, however, the problem of adversarial evasion attacks on ML-based malware classifiers is understated. MLbased techniques are vulnerable to adversarial evasion attacks. The malware authors constantly try to craft adversarial examples to elude existing malware detection systems. This research presents the fragility of ML-based Android malware classifiers in adversarial environments and proposes novel techniques to counter adversarial evasion attacks on ML based Android malware classifiers. First, we start our analysis by introducing the problem of Android malware detection in adversarial environments and provide a comprehensive overview of the domain. Second, we highlight the problem of malware clones in popular Android malware repositories. The malware clones in the datasets can potentially lead to biased results and computational overhead. Although many strategies are proposed in the literature to detect repackaged Android malware, these techniques require burdensome code inspection. Consequently, we employ a lightweight and novel strategy based on package names reusing to identify repackaged Android malware and build a clones-free Android malware dataset. Furthermore, we investigate the impact of repacked Android malware on various ML-based classifiers by training them on a clones free training set and testing on a set of benign, non repacked malware and all the malware clones in the dataset. Although trained on a reduced train set, we achieved up to 98.7% F1 score. Third, we propose Cure-Droid, an Android malware classification model trained on hybrid features and optimized using a tree-based pipeline optimization technique (TPoT). Fourth, to present the fragility of Cure- Droid model in adversarial environments, we formulate multiple adversarial evasion attacks to elude the model. Fifth, to counter adversarial evasion attacks on ML-based Android malware detectors, we propose CureDroid*, a novel and adversarially aware Android malware classification model. CureDroid* is based on an ensemble of ML-based models trained on distinct set of features where each model has the individual capability to detect Android malware. The CureDroid* model employs an ensemble of five ML-based models where each model is selected and optimized using TPoT. Our experimental results demonstrate that CureDroid* achieves up to 99.2% accuracy in non-adversarial settings and can detect up to 30 fabricated input features in the best case. Finally, we propose TrickDroid, a novel cumulative adversarial training framework based on Oracle and GAN-based adversarial data. Our experimental results present the efficacy of TrickDroid with up to 99.46% evasion detection

    Longitudinal performance analysis of machine learning based Android malware detectors

    Get PDF
    This paper presents a longitudinal study of the performance of machine learning classifiers for Android malware detection. The study is undertaken using features extracted from Android applications first seen between 2012 and 2016. The aim is to investigate the extent of performance decay over time for various machine learning classifiers trained with static features extracted from date-labelled benign and malware application sets. Using date-labelled apps allows for true mimicking of zero-day testing, thus providing a more realistic view of performance than the conventional methods of evaluation that do not take date of appearance into account. In this study, all the investigated machine learning classifiers showed progressive diminishing performance when tested on sets of samples from a later time period. Overall, it was found that false positive rate (misclassifying benign samples as malicious) increased more substantially compared to the fall in True Positive rate (correct classification of malicious apps) when older models were tested on newer app samples

    A Hybrid Model for Android Malware Detection using Decision Tree and KNN

    Get PDF
    Malwares are becoming a major problem nowadays all around the world in android operating systems. The malware is a piece of software developed for harming or exploiting certain other hardware as well as software. The term Malware is also known as malicious software which is utilized to define Trojans, viruses, as well as other kinds of spyware. There have been developed many kinds of techniques for protecting the android operating systems from malware during the last decade. However, the existing techniques have numerous drawbacks such as accuracy to detect the type of malware in real-time in a quick manner for protecting the android operating systems. In this article, the authors developed a hybrid model for android malware detection using a decision tree and KNN (k-nearest neighbours) technique. First, Dalvik opcode, as well as real opcode, was pulled out by using the reverse procedure of the android software. Secondly, eigenvectors of sampling were produced by utilizing the n-gram model. Our suggested hybrid model efficiently combines KNN along with the decision tree for effective detection of the android malware in real-time. The outcome of the proposed scheme illustrates that the proposed hybrid model is better in terms of the accurate detection of any kind of malware from the Android operating system in a fast and accurate manner. In this experiment, 815 sample size was selected for the normal samples and the 3268-sample size was selected for the malicious samples. Our proposed hybrid model provides pragmatic values of the parameters namely precision, ACC along with the Recall, and F1 such as 0.93, 0.98, 0.96, and 0.99 along with 0.94, 0.99, 0.93, and 0.99 respectively. In the future, there are vital possibilities to carry out more research in this field to develop new methods for Android malware detection
    • …
    corecore