6,806 research outputs found

    Malware classification framework for dynamic analysis using Information Theory

    Get PDF
    Objectives: 1. To propose a framework for Malware Classification System (MCS) to analyze malware behavior dynamically using a concept of information theory and a machine learning technique. 2. To extract behavioral patterns from execution reports of malware in terms of its features and generates a data repository. 3. To select the most promising features using information theory based concepts. Methods/Statistical Analysis: Today, malware is a major concern of computer security experts. Variety and in- creasing number of malware affects millions of systems in the form of viruses, worms, Trojans etc. Many techniques have been proposed to analyze the malware to its class accurately. Some of analysis techniques analyzed malware based upon its structure, code flow, etc. without executing it (called static analysis), whereas other techniques (termed as dynamic analysis) focused to monitor the behavior of malware by executing it and comparing it with known malware behavior. Dynamic analysis has proved to be effective in malware detection as behavior is more difficult to mask while executing than its underlying code (static analysis). In this study, we propose a framework for Malware Classification System (MCS) to analyze malware behavior dynamically using a concept of information theory and a machine learning technique. The proposed framework extracts behavioral patterns from execution reports of malware in terms of its features and generates a data repository. Further, it selects the most promising features using information theory based concepts. Findings: The proposed framework detects the family of unknown malware samples after training of a classifier from malware data repository. We validated the applicability of the proposed framework by comparing with the other dynamic malware analysis technique on a real malware dataset from Virus Total. Application: The proposed framework is a Malware Classification System (MCS) to analyze malware behavior dynamically using a concept of information theory and a machine learning technique

    Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning

    Get PDF
    We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform (DCT) domain and a neural network is trained for malware detection. A shallow neural network is trained for classification, and its accuracy is compared with deep-network architectures such as ResNet that are trained using transfer learning. Neither dis-assembly nor behavioral analysis of malware is required for these methods. Motivated by the visual similarity of these images for different malware families, we compare our deep neural network models with standard image features like GIST descriptors to evaluate the performance. A joint feature measure is proposed to combine different features using error analysis to get an accurate ensemble model for improved classification performance. A new dataset called MaleX which contains around 1 million malware and benign Windows executable samples is created for large-scale malware detection and classification experiments. Experimental results are quite promising with 96% binary classification accuracy on MaleX. The proposed model is also able to generalize well on larger unseen malware samples and the results compare favorably with state-of-the-art static analysis-based malware detection algorithms

    Android application evolution and malware detection

    Full text link
    Android has dominated the mobile market for a few years now, and continues to increase its market share. Meanwhile, Android has seen a sharper increase in malware. It is a matter of utmost urgency to find a better way to detect Android malware. In this thesis, we use static code analysis to extract the android application security features and two different classification models to detect Android malware. Our permissions-based classification model can achieve 96.5% accuracy, 97.2% TPR and 95.5% TNR with lower overhead. Comparing with others’ work, our results increase the accuracy by 4.9%, TPR by 5.6% and TNR by 3.9%. By using multiple security metrics in the classification model, the detection rate increases to 99.3% accuracy, 99.5% TPR and 99% TNR. Moreover, we investigate Android application security evolution. The data shows that more than half applications have security vulnerabilities and/or dangerous behaviors. The security problems remain or even worse in the updated versions of most applications. Based on this result, we argue that there can be higher chance to impose update attack, where, the malware is contained in the updated version of a benign application. Our multiple-metrics based classification model is adapted to detect the update attack and can achieve similar or even better detection rate based on our initial results

    Dynamic behavior analysis of android applications for malware detection

    Get PDF
    Android is most popular operating system for smartphones and small devices with 86.6% market share (Chau 2016). Its open source nature makes it more prone to attacks creating a need for malware analysis. Main approaches for detecting malware intents of mobile applications are based on either static analysis or dynamic analysis. In static analysis, apps are inspected for suspicious patterns of code to identify malicious segments. However, several obfuscation techniques are available to provide a guard against such analysis. The dynamic analysis on the other hand is a behavior-based detection method that involves investigating the run-time behavior of the suspicious app to uncover malware. The present study extracts the system call behavior of 216 malicious apps and 278 normal apps to construct a feature vector for training a classifier. Seven data classification techniques including decision tree, random forest, gradient boosting trees, k-NN, Artificial Neural Network, Support Vector Machine and deep learning were applied on this dataset. Three feature ranking techniques were usedto select appropriate features from the set of 337 attributes (system calls). These techniques of feature ranking included information gain, Chi-square statistic and correlation analysis by determining weights of the features. After discarding select features with low ranks the performances of the classifiers were measured using accuracy and recall. Experiments show that Support Vector Machines (SVM) after selecting features through correlation analysis outperformed other techniques where an accuracy of 97.16% is achieved with recall 99.54% (for malicious apps). The study also contributes by identifying the set of systems calls that are crucial in identifying malicious intent of android apps

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Enhancement Of Static Code Analysis Malware Detection Framework For Android Category-Based Application

    Get PDF
    Android has become the number one mobile operating system in term of worldwide market share since May 2012. The highest demand and the open source factors had brought Android operating system into main target of malware creator. Two approaches introduced to detect malware in Android mobile environment namely static analysis and dynamic analysis. Static analysis is where the static features are examined. Too many features used, features extraction time consuming and the reliability of accuracy result by various machine learning algorithm are the main issues spotted in static analysis approach. As such, this thesis investigates the whole Android static analysis framework in detecting and classifying mobile malware. The early study found that two static features that are often used (permission and API calls) with the right mapping are sufficient to analyse the Android malware. The new permission(s) toward API call(s) mapping for Android level 16 to 24 is constructed based on Android official developer guideline references where previously these two features are mapped without using the standard guideline. On experimenting and analysing the framework, there are 4767 benign applications from 10 different categories was collected from Android official market place and 3443 malware applications was collected from AndroZoo dataset. All benign files are then scanned through VirusTotal to ensure that all collected files are free from virus. On extracting the desired features, a new automation of feature extraction using Depth First Search (DFS) with sequential search are introduced and succeed to extract the targeted features with consideration of no limitation on application file size also no limitation on file number. In order to enables machine learning to train faster and reduces the complexity of a machine learning model, the information gain features selection is applied towards the extracted features. Four types of machine learning algorithm were tested with four different kind of splitting dataset techniques separately. The result shows that the detection of malware within application category achieves higher accuracy compared to application with non-category based. In increasing the reliability, the results obtained are then validated by using statistical analysis procedure which each machine learning classification algorithm are iterate 50 times. The validation results show that Random Forest with 10-folds cross validation spitting dataset achieved 8 highest performance compared to benchmark study and two other classifiers. This study suggests the work to combine the optimization of feature selection and algorithm parameters to achieve higher accuracy and acquire more reliable comparison

    Longitudinal performance analysis of machine learning based Android malware detectors

    Get PDF
    This paper presents a longitudinal study of the performance of machine learning classifiers for Android malware detection. The study is undertaken using features extracted from Android applications first seen between 2012 and 2016. The aim is to investigate the extent of performance decay over time for various machine learning classifiers trained with static features extracted from date-labelled benign and malware application sets. Using date-labelled apps allows for true mimicking of zero-day testing, thus providing a more realistic view of performance than the conventional methods of evaluation that do not take date of appearance into account. In this study, all the investigated machine learning classifiers showed progressive diminishing performance when tested on sets of samples from a later time period. Overall, it was found that false positive rate (misclassifying benign samples as malicious) increased more substantially compared to the fall in True Positive rate (correct classification of malicious apps) when older models were tested on newer app samples
    corecore