7 research outputs found

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning

    Get PDF
    International audienceThe ability to efficiently detect the software protections used is at a prime to facilitate the selection and application of adequate deob-fuscation techniques. We present a novel approach that combines semantic reasoning techniques with ensemble learning classification for the purpose of providing a static detection framework for obfuscation transformations. By contrast to existing work, we provide a methodology that can detect multiple layers of obfuscation, without depending on knowledge of the underlying functionality of the training-set used. We also extend our work to detect constructions of obfuscation transformations, thus providing a fine-grained methodology. To that end, we provide several studies for the best practices of the use of machine learning techniques for a scalable and efficient model. According to our experimental results and evaluations on obfuscators such as Tigress and OLLVM, our models have up to 91% accuracy on state-of-the-art obfuscation transformations. Our overall accuracies for their constructions are up to 100%

    An enhanced performance model for metamorphic computer virus classification and detectioN

    Get PDF
    Metamorphic computer virus employs various code mutation techniques to change its code to become new generations. These generations have similar behavior and functionality and yet, they could not be detected by most commercial antivirus because their solutions depend on a signature database and make use of string signature-based detection methods. However, the antivirus detection engine can be avoided by metamorphism techniques. The purpose of this study is to develop a performance model based on computer virus classification and detection. The model would also be able to examine portable executable files that would classify and detect metamorphic computer viruses. A Hidden Markov Model implemented on portable executable files was employed to classify and detect the metamorphic viruses. This proposed model that produce common virus statistical patterns was evaluated by comparing the results with previous related works and famous commercial antiviruses. This was done by investigating the metamorphic computer viruses and their features, and the existing classifications and detection methods. Specifically, this model was applied on binary format of portable executable files and it was able to classify if the files belonged to a virus family. Besides that, the performance of the model, practically implemented and tested, was also evaluated based on detection rate and overall accuracy. The findings indicated that the proposed model is able to classify and detect the metamorphic virus variants in portable executable file format with a high average of 99.7% detection rate. The implementation of the model is proven useful and applicable for antivirus programs

    Malgazer: An Automated Malware Classifier With Running Window Entropy and Machine Learning

    Get PDF
    This dissertation explores functional malware classification using running window entropy and machine learning classifiers. This topic was under researched in the prior literature, but the implications are important for malware defense. This dissertation will present six new design science artifacts. The first artifact was a generalized machine learning based malware classifier model. This model was used to categorize and explain the gaps in the prior literature. This artifact was also used to compare the prior literature to the classifiers created in this dissertation, herein referred to as “Malgazer” classifiers. Running window entropy data was required, but the algorithm was too slow to compute at scale. This dissertation presents an optimized version of the algorithm that requires less than 2% of the time of the original algorithm. Next, the classifications for the malware samples were required, but there was no one unified and consistent source for this information. One of the design science artifacts was the method to determine the classifications from publicly available resources. Once the running window entropy data was computed and the functional classifications were collected, the machine learning algorithms were trained at scale so that one individual could complete over 200 computationally intensive experiments for this dissertation. The method to scale the computations was an instantiation design science artifact. The trained classifiers were another design science artifact. Lastly, a web application was developed so that the classifiers could be utilized by those without a programming background. This was the last design science artifact created by this research. Once the classifiers were developed, they were compared to prior literature theoretically and empirically. A malware classification method from prior literature was chosen (referred to herein as “GIST”) for an empirical comparison to the Malgazer classifiers. The best Malgazer classifier produced an accuracy of approximately 95%, which was around 0.76% more accurate than the GIST method on the same data sets. Then, the Malgazer classifier was compared to the prior literature theoretically, based upon the empirical analysis with GIST, and Malgazer performed at least as well as the prior literature. While the data, methods, and source code are open sourced from this research, most prior literature did not provide enough information or data to replicate and verify each method. This prevented a full and true comparison to prior literature, but it did not prevent recommending the Malgazer classifier for some use cases

    REFORM: A framework for malware packer analysis using information theory and statistical methods

    Get PDF
    Malware (malicious software) is a term used to describe computer viruses, Trojan horses, and other pieces of software that are used to attack computer systems. The increasing outbreak of malware in recent years poses a serious security threat to computer networks. Malware writers often obfuscate malware to hinder malware scanners from malicious code detection, i.e., to hide the fact that the software is actually malicious. Packing is the most common obfuscation method used by malware writers. Recently, there has been a dramatic increase in the number of new packers and variants of existing ones. Moreover, packers are employing increasingly sophisticated anti-unpacker tricks and obfuscation methods. Identifying a packer and obtaining a sample of unpacked malware are important to AV (Anti-virus) researchers who work on updating antivirus software to defend against malware, so that they can perform in-depth analysis. However, packer analysis is a technically intense research task, requiring the AV experts' deep knowledge of hardware, operating systems, compilers and programming languages. The significant growth of packers, in both number and complexity, prevents AV researchers from carrying out their daily AV research work efficiently and effectively. This PhD project has investigated the common features of packers and presented a novel, fast yet effective packer analysis framework called REFORM (Reverse Engineering For Obfuscation ReMoval). The system applies various technologies including reverse engineering, compression algorithms and statistical methods to de-obfuscate packers. REFORM is comprised of three major components that solve the problem of automatic packer analysis at three important stages of the packer analysis life cycle, namely packer detection, packer identification and unpacking, respectively: (1) It incorporates a novel randomness test that preserves local detail in the packer. This makes it easy for an AV researcher to distinguish areas of compressed/encrypted data from other code and data. (2) Using the above randomness test, each packer is seen to exhibit a unique pattern in its randomness distribution. The REFORM framework therefore provides an extremely effective packer classification model based on a set of randomness measurements generated from a packed file. Various statistical classifiers have also been integrated in REFORM to achieve even better classification performance. (3) REFORM enables an efficient generic unpacking strategy which uses an ordered address execution histogram to capture the memory after the unpacking loop has executed. We demonstrate REFORM 's capability on speeding up packer detection, identification and unpacking procedures. Such an automatic system is shown in the thesis to be essential to keeping up with the accelerating growth in packed malware

    Forensic identification and detection of hidden and obfuscated malware

    Get PDF
    The revolution in online criminal activities and malicious software (malware) has posed a serious challenge in malware forensics. Malicious attacks have become more organized and purposefully directed. With cybercrimes escalating to great heights in quantity as well as in sophistication and stealth, the main challenge is to detect hidden and obfuscated malware. Malware authors use a variety of obfuscation methods and specialized stealth techniques of information hiding to embed malicious code, to infect systems and to thwart any attempt to detect them, specifically with the use of commercially available anti-malware engines. This has led to the situation of zero-day attacks, where malware inflict systems even with existing security measures. The aim of this thesis is to address this situation by proposing a variety of novel digital forensic and data mining techniques to automatically detect hidden and obfuscated malware. Anti-malware engines use signature matching to detect malware where signatures are generated by human experts by disassembling the file and selecting pieces of unique code. Such signature based detection works effectively with known malware but performs poorly with hidden or unknown malware. Code obfuscation techniques, such as packers, polymorphism and metamorphism, are able to fool current detection techniques by modifying the parent code to produce offspring copies resulting in malware that has the same functionality, but with a different structure. These evasion techniques exploit the drawbacks of traditional malware detection methods, which take current malware structure and create a signature for detecting this malware in the future. However, obfuscation techniques aim to reduce vulnerability to any kind of static analysis to the determent of any reverse engineering process. Furthermore, malware can be hidden in file system slack space, inherent in NTFS file system based partitions, resulting in malware detection that even more difficult.Doctor of Philosoph
    corecore