319 research outputs found

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Intra-procedural Path-insensitive Grams (i-grams) and Disassembly Based Features for Packer Tool Classification and Detection

    Get PDF
    The DoD relies on over seven million computing devices worldwide to accomplish a wide range of goals and missions. Malicious software, or malware, jeopardizes these goals and missions. However, determining whether an arbitrary software executable is malicious can be difficult. Obfuscation tools, called packers, are often used to hide the malicious intent of malware from anti-virus programs. Therefore detecting whether or not an arbitrary executable file is packed is a critical step in software security. This research uses machine learning methods to build a system, the Polymorphic and Non-Polymorphic Packer Detection (PNPD) system, that detects whether an executable is packed using both sequences of instructions, called i-grams, and disassembly information as features for machine learning. Both i-grams and disassembly features successfully detect packed executables with top configurations achieving average accuracies above 99.5\%, average true positive rates above 0.977, and average false positive rates below 1.6e-3 when detecting polymorphic packers

    Malware Detection Based on Structural and Behavioural Features of API Calls

    Get PDF
    In this paper, we propose a five-step approach to detect obfuscated malware by investigating the structural and behavioural features of API calls. We have developed a fully automated system to disassemble and extract API call features effectively from executables. Using n-gram statistical analysis of binary content, we are able to classify if an executable file is malicious or benign. Our experimental results with a dataset of 242 malwares and 72 benign files have shown a promising accuracy of 96.5% for the unigram model. We also provide a preliminary analysis by our approach using support vector machine (SVM) and by varying n-values from 1 to 5, we have analysed the performance that include accuracy, false positives and false negatives. By applying SVM, we propose to train the classifier and derive an optimum n-gram model for detecting both known and unknown malware efficiently

    Design and Performance Analysis of an Anti-Malware System based on Generative Adversarial Network Framework

    Get PDF
    The cyber realm is overwhelmed with dynamic malware that promptly penetrates all defense mechanisms, operates unapprehended to the user, and covertly causes damage to sensitive data. The current generation of cyber users is being victimized by the interpolation of malware each day due to the pervasive progression of Internet connectivity. Malware is dispersed to infiltrate the security, privacy, and integrity of the system. Conventional malware detection systems do not have the potential to detect novel malware without the accessibility of their signatures, which gives rise to a high False Negative Rate (FNR). Previously, there were numerous attempts to address the issue of malware detection, but none of them effectively combined the capabilities of signature-based and machine learning-based detection engines. To address this issue, we have developed an integrated Anti-Malware System (AMS) architecture that incorporates both conventional signature-based detection and AI-based detection modules. Our approach employs a Generative Adversarial Network (GAN) based Malware Classifier Optimizer (MCOGAN) framework, which can optimize a malware classifier. This framework utilizes GANs to generate fabricated benign files that can be used to train external discriminators for optimization purposes. We describe our proposed framework and anti-malware system in detail to provide a better understanding of how a malware detection system works. We evaluate our approach using the Figshare dataset and state-of-the-art models as discriminators, and our results demonstrate improved malware detection performance compared to existing models

    Protecting Software through Obfuscation:Can It Keep Pace with Progress in Code Analysis?

    Get PDF
    Software obfuscation has always been a controversially discussed research area. While theoretical results indicate that provably secure obfuscation in general is impossible, its widespread application in malware and commercial software shows that it is nevertheless popular in practice. Still, it remains largely unexplored to what extent today’s software obfuscations keep up with state-of-the-art code analysis and where we stand in the arms race between software developers and code analysts. The main goal of this survey is to analyze the effectiveness of different classes of software obfuscation against the continuously improving deobfuscation techniques and off-the-shelf code analysis tools. The answer very much depends on the goals of the analyst and the available resources. On the one hand, many forms of lightweight static analysis have difficulties with even basic obfuscation schemes, which explains the unbroken popularity of obfuscation among malware writers. On the other hand, more expensive analysis techniques, in particular when used interactively by a human analyst, can easily defeat many obfuscations. As a result, software obfuscation for the purpose of intellectual property protection remains highly challenging.</jats:p

    Framework for Security and Privacy in RFID based Telematics

    Get PDF
    Telematics by its nature requires the capture of sensor data, storage and exchange of data to obtain remote servicesMost of the commercial antivirus software fail to detect unknown and new malicious code. Proliferation of malicious code (viruses, worms, Trojans, root kits, spyware, crime ware, phishing attacks, and other malware designed to infiltrate or damage a system without user's consent) in recent years has presented a serious threat to Internet, individual users, and enterprises alike. In addition malware once confined to wired networks has now found a new breeding ground in mobile devices, automatic identification and collection (AIDC) technologies, and radio frequency identification devices (RFID) that use wireless networks to communicate and connect to the Internet. RFID systems encountered a number of threats and privacy issues. In order to stay ahead and be proactive in an asymmetric race against malicious code writers, developers of anti-malware technologies have to rely on automatic malware analysis tools. In this paper, we introduce a method of functionally classifying malware and malicious code by using well-known computational intelligent techniques. MEDiC (Malware Examiner using Disassembled Code) is our answer to a more accurate malware detection method. This work is an also attempt to address the information security issues chiefly the attacks through the databases that these RFID tags called iCLASS, which are of the active type. After a particular malicious code has been first identified, it can be analyzed to extract the signature, which provides a basis for detecting variants and mutants of the same malware in the future

    A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness

    Get PDF
    The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products
    • …
    corecore