113 research outputs found

    Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

    Full text link
    Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and number of malware species made it very difficult for forensics investigators to provide an on time response. Therefore, Machine Learning (ML) aided malware analysis became a necessity to automate different aspects of static and dynamic malware investigation. We believe that machine learning aided static analysis can be used as a methodological approach in technical Cyber Threats Intelligence (CTI) rather than resource-consuming dynamic malware analysis that has been thoroughly studied before. In this paper, we address this research gap by conducting an in-depth survey of different machine learning methods for classification of static characteristics of 32-bit malicious Portable Executable (PE32) Windows files and develop taxonomy for better understanding of these techniques. Afterwards, we offer a tutorial on how different machine learning techniques can be utilized in extraction and analysis of a variety of static characteristic of PE binaries and evaluate accuracy and practical generalization of these techniques. Finally, the results of experimental study of all the method using common data was given to demonstrate the accuracy and complexity. This paper may serve as a stepping stone for future researchers in cross-disciplinary field of machine learning aided malware forensics.Comment: 37 Page

    Analysis avoidance techniques of malicious software

    Get PDF
    Anti Virus (AV) software generally employs signature matching and heuristics to detect the presence of malicious software (malware). The generation of signatures and determination of heuristics is dependent upon an AV analyst having successfully determined the nature of the malware, not only for recognition purposes, but also for the determination of infected files and startup mechanisms that need to be removed as part of the disinfection process. If a specimen of malware has not been previously extensively analyzed, it is unlikely to be detected by AV software. In addition, malware is becoming increasingly profit driven and more likely to incorporate stealth and deception techniques to avoid detection and analysis to remain on infected systems for a myriad of nefarious purposes. Malware extends beyond the commonly thought of virus or worm, to customized malware that has been developed for specific and targeted miscreant purposes. Such customized malware is highly unlikely to be detected by AV software because it will not have been previously analyzed and a signature will not exist. Analysis in such a case will have to be conducted by a digital forensics analyst to determine the functionality of the malware. Malware can employ a plethora of techniques to hinder the analysis process conducted by AV and digital forensics analysts. The purpose of this research has been to answer three research questions directly related to the employment of these techniques as: 1. What techniques can malware use to avoid being analyzed? 2. How can the use of these techniques be detected? 3. How can the use of these techniques be mitigated

    Mimicking anti-viruses with machine learning and entropy profiles

    Get PDF
    The quality of anti-virus software relies on simple patterns extracted from binary files. Although these patterns have proven to work on detecting the specifics of software, they are extremely sensitive to concealment strategies, such as polymorphism or metamorphism. These limitations also make anti-virus software predictable, creating a security breach. Any black hat with enough information about the anti-virus behaviour can make its own copy of the software, without any access to the original implementation or database. In this work, we show how this is indeed possible by combining entropy patterns with classification algorithms. Our results, applied to 57 different anti-virus engines, show that we can mimic their behaviour with an accuracy close to 98% in the best case and 75% in the worst, applied on Windows’ disk resident malware

    Mimicking anti-viruses with machine learning and entropy profiles

    Get PDF
    The quality of anti-virus software relies on simple patterns extracted from binary files. Although these patterns have proven to work on detecting the specifics of software, they are extremely sensitive to concealment strategies, such as polymorphism or metamorphism. These limitations also make anti-virus software predictable, creating a security breach. Any black hat with enough information about the anti-virus behaviour can make its own copy of the software, without any access to the original implementation or database. In this work, we show how this is indeed possible by combining entropy patterns with classification algorithms. Our results, applied to 57 different anti-virus engines, show that we can mimic their behaviour with an accuracy close to 98% in the best case and 75% in the worst, applied on Windows’ disk resident malware

    An Automated and Comprehensive Framework for IoT Botnet Detection and Analysis (IoT-BDA)

    Get PDF
    The proliferation of insecure Internet-connected devices gave rise to the IoT botnets which can grow very large rapidly and may perform high-impact cyber-attacks. The related studies for tackling IoT botnets are concerned with either capturing or analyzing IoT botnet samples, using honeypots and sandboxes, respectively. The lack of integration between the two implies that the samples captured by the honeypots must be manually submitted for analysis in sandboxes, introducing a delay during which a botnet may change its operation. Furthermore, the effectiveness of the proposed sandboxes is limited by the potential use of anti-analysis techniques and the inability to identify features for effective detection and identification of IoT botnets. In this paper, we propose and evaluate a novel framework, the IoT-BDA framework, for automated capturing, analysis, identification, and reporting of IoT botnets. The framework consists of honeypots integrated with a novel sandbox that supports a wider range of hardware and software configurations, and can identify indicators of compromise and attack, along with anti-analysis, persistence, and anti-forensics techniques. These features can make botnet detection and analysis, and infection remedy more effective. The framework reports the findings to a blacklist and abuse service to facilitate botnet suspension. The paper also describes the discovered anti-honeypot techniques and the measures applied to reduce the risk of honeypot detection. Over the period of seven months, the framework captured, analyzed, and reported 4077 unique IoT botnet samples. The analysis results show that some IoT botnets used anti-analysis, persistence, and anti-forensics techniques typically seen in traditional botnets

    REFORM: A framework for malware packer analysis using information theory and statistical methods

    Get PDF
    Malware (malicious software) is a term used to describe computer viruses, Trojan horses, and other pieces of software that are used to attack computer systems. The increasing outbreak of malware in recent years poses a serious security threat to computer networks. Malware writers often obfuscate malware to hinder malware scanners from malicious code detection, i.e., to hide the fact that the software is actually malicious. Packing is the most common obfuscation method used by malware writers. Recently, there has been a dramatic increase in the number of new packers and variants of existing ones. Moreover, packers are employing increasingly sophisticated anti-unpacker tricks and obfuscation methods. Identifying a packer and obtaining a sample of unpacked malware are important to AV (Anti-virus) researchers who work on updating antivirus software to defend against malware, so that they can perform in-depth analysis. However, packer analysis is a technically intense research task, requiring the AV experts' deep knowledge of hardware, operating systems, compilers and programming languages. The significant growth of packers, in both number and complexity, prevents AV researchers from carrying out their daily AV research work efficiently and effectively. This PhD project has investigated the common features of packers and presented a novel, fast yet effective packer analysis framework called REFORM (Reverse Engineering For Obfuscation ReMoval). The system applies various technologies including reverse engineering, compression algorithms and statistical methods to de-obfuscate packers. REFORM is comprised of three major components that solve the problem of automatic packer analysis at three important stages of the packer analysis life cycle, namely packer detection, packer identification and unpacking, respectively: (1) It incorporates a novel randomness test that preserves local detail in the packer. This makes it easy for an AV researcher to distinguish areas of compressed/encrypted data from other code and data. (2) Using the above randomness test, each packer is seen to exhibit a unique pattern in its randomness distribution. The REFORM framework therefore provides an extremely effective packer classification model based on a set of randomness measurements generated from a packed file. Various statistical classifiers have also been integrated in REFORM to achieve even better classification performance. (3) REFORM enables an efficient generic unpacking strategy which uses an ordered address execution histogram to capture the memory after the unpacking loop has executed. We demonstrate REFORM 's capability on speeding up packer detection, identification and unpacking procedures. Such an automatic system is shown in the thesis to be essential to keeping up with the accelerating growth in packed malware

    Malware: the never-ending arm race

    Get PDF
    "Antivirus is death"' and probably every detection system that focuses on a single strategy for indicators of compromise. This famous quote that Brian Dye --Symantec's senior vice president-- stated in 2014 is the best representation of the current situation with malware detection and mitigation. Concealment strategies evolved significantly during the last years, not just like the classical ones based on polimorphic and metamorphic methodologies, which killed the signature-based detection that antiviruses use, but also the capabilities to fileless malware, i.e. malware only resident in volatile memory that makes every disk analysis senseless. This review provides a historical background of different concealment strategies introduced to protect malicious --and not necessarily malicious-- software from different detection or analysis techniques. It will cover binary, static and dynamic analysis, and also new strategies based on machine learning from both perspectives, the attackers and the defenders

    A Survey on Malware Analysis Techniques: Static, Dynamic, Hybrid and Memory Analysis

    Get PDF
    Now a day the threat of malware is increasing rapidly. A software that sneaks to your computer system without your knowledge with a harmful intent to disrupt your computer operations. Due to the vast number of malware, it is impossible to handle malware by human engineers. Therefore, security researchers are taking great efforts to develop accurate and effective techniques to detect malware. This paper presents a semantic and detailed survey of methods used for malware detection like signature-based and heuristic-based. The Signature-based technique is largely used today by anti-virus software to detect malware, is fast and capable to detect known malware. However, it is not effective in detecting zero-day malware and it is easily defeated by malware that use obfuscation techniques. Likewise, a considerable false positive rate and high amount of scanning time are the main limitations of heuristic-based techniques. Alternatively, memory analysis is a promising technique that gives a comprehensive view of malware and it is expected to become more popular in malware analysis. The main contributions of this paper are: (1) providing an overview of malware types and malware detection approaches, (2) discussing the current malware analysis techniques, their findings and limitations, (3) studying the malware obfuscation, attacking and anti-analysis techniques, and (4) exploring the structure of memory-based analysis in malware detection. The detection approaches have been compared with each other according to their techniques, selected features, accuracy rates, and their advantages and disadvantages. This paper aims to help the researchers to have a general view of malware detection field and to discuss the importance of memory-based analysis in malware detection
    • …
    corecore