432 research outputs found

    Metamorphic Code Generation from LLVM IR Bytecode

    Get PDF
    Metamorphic software changes its internal structure across generations with its functionality remaining unchanged. Metamorphism has been employed by malware writers as a means of evading signature detection and other advanced detection strate- gies. However, code morphing also has potential security benefits, since it increases the “genetic diversity” of software. In this research, we have created a metamorphic code generator within the LLVM compiler framework. LLVM is a three-phase compiler that supports multiple source languages and target architectures. It uses a common intermediate representation (IR) bytecode in its optimizer. Consequently, any supported high-level programming language can be transformed to this IR bytecode as part of the LLVM compila- tion process. Our metamorphic generator functions at the IR bytecode level, which provides many advantages over previously developed metamorphic generators. The morphing techniques that we employ include dead code insertion—where the dead code is actually executed within the morphed code—and subroutine permutation. We have tested the effectiveness of our code morphing using hidden Markov model analysis


    Get PDF
    Signature-based detection relies on patterns present in viruses and provides a relatively simple and efficient method for detecting known viruses. At present, most anti-virus systems rely primarily on signature detection. Metamorphic viruses are one of the most difficult types of viruses to detect. Such viruses change their internal structure, which provides an effective means of evading signature detection. Previous work has provided a rigorous proof that a fairly simple metamorphic engine can generate viruses that will evade any signature-based detection. In this project, we first implement a metamorphic engine that is provably undetectable—in the sense of signature-based detection. We then show that, as expected, the resulting viruses are not detected by popular commercial anti-virus scanners. Finally, we analyze the same set of viruses using a previously developed approach based on hidden Markov models (HMM). This HMM- based technique easily detects the viruses

    Application of Artificial Intelligence for Detecting Derived Viruses

    Get PDF
    Computer viruses have become complex and operates in a stealth mode to avoid detection. New viruses are argued to be created each and every day. However, most of these supposedly ‘new’ viruses are not completely new. Most of the supposedly ‘new’ viruses are not necessarily created from scratch with completely new (something novel that has never been seen before) mechanisms. For example, most of these viruses just change their form and signatures to avoid detection. But their operation and the way they infect files and systems is still the same. Hence, such viruses cannot be argued to be new. In this paper, the authors refer to such viruses as derived viruses. Just like new viruses, derived viruses are hard to detect with current scanning-detection methods. Therefore, this paper proposes a virus detection system that detects derived viruses better than existing methods. The proposed system integrates a mutating engine together with neural network to improve the detection rate of derived viruses. Experimental results show that the proposed model can detect derived viruses with an average accuracy detection rate of 80% (this include 91% success rate on first generation, 83% success rate on second generation and 65% success rate on third generation). The results further shows that the correlation between the original virus signature and its derivatives decreases further down along its generations

    Detecting Encrypted Malware Using Hidden Markov Models

    Get PDF
    Encrypted code is often present in some types of advanced malware, while such code virtually never appears in legitimate applications. Hence, the presence of encrypted code within an executable file could serve as a strong heuristic for detecting malware. In this research, we consider the feasibility of detecting encrypted code using hidden Markov models

    Pairwise Alignment of Metamorphic Computer Viruses

    Get PDF
    Computer viruses and other forms of malware pose a threat to virtually any software system (with only a few exceptions). A computer virus is a piece of software which takes advantage of known weaknesses in a software system, and usually has the ability to deliver a malicious payload. A common technique that virus writers use to avoid detection is to enable the virus to change itself by having some kind of self-modifying code. This kind of virus is commonly known as a metamorphic virus, and can be particularly difficult to detect [17]. Existing virus detection software is continually being improved upon in order to keep up with the rising complexity of today’s modern computer viruses. A new approach to detecting metamorphic viruses, which is an extension of an idea posed in a student writing project from a previous semester [17], will be considered in this project. If a large set of viruses in one “family” of metamorphic viruses can be treated as simple sequences of op-codes, then sequence analysis techniques used in other fields of study like bioengineering [4] could be used to develop a profile hidden Markov model (HMM). This profile would then be used to score an arbitrary op-code sequence (i.e. a program which may or may not be in the virus family) – if the output score exceeds a designated threshold it could be concluded that the input sequence was likely to have been from that same virus family. One of the most common techniques to detect viruses is called signature detection, which involves an analysis of known viruses to find signatures, or strings of bytes, which are found in viruses and not in most non-malicious code. If the virus is metamorphic it could potentially be difficult to find a single signature that will consistently be found in every version of a metamorphic virus. Since a profile HMM would score the overall similarity in structure to a virus “family”, it could theoretically detect the virus even if a reliable signature cannot be created. In order to develop a profile HMM for a virus family, the first step is to create a multiple sequence alignment (MSA) for the set of family viruses; this can then be used to “train” the profile HMM. This paper will concentrate on the techniques for creating MSA’s for real world virus op-code sequences which will best match the virus family, as well as to discuss the overall plausibility of the idea of using a profile HMM to detect metamorphic viruses. Creating and testing the profile HMM to detect the viruses will be the subject of another student project

    Metamorphic Detection Using Function Call Graph Analysis

    Get PDF
    Well-designed metamorphic malware can evade many commonly used malware detection techniques including signature scanning. In this research, we consider a score based on function call graph analysis. We test this score on several challenging classes of metamorphic malware and we show that the resulting detection rates yield an improvement over previous research

    Metamorphic Viruses with Built-In Buffer Overflow

    Get PDF
    Metamorphic computer viruses change their structure—and thereby their signature—each time they infect a system. Metamorphic viruses are potentially one of the most dangerous types of computer viruses because they are difficult to detect using signature-based methods. Most anti-virus software today is based on signature detection techniques. In this project, we create and analyze a metamorphic virus toolkit which creates viruses with a built-in buffer overflow. The buffer overflow serves to obfuscate the entry point of the actual virus, thereby making detection more challenging. We show that the resulting viruses successfully evade detection by commercial virus scanners. Several modern operating systems (e.g., Windows Vista and Windows 7) employ address space layout randomization (ASLR), which is designed to prevent most buffer overflow attacks. We show that our proposed buffer overflow technique succeeds, even in the presence of ASLR. Finally, we consider possible defenses against our proposed technique

    Hunting for Undetectable Metamorphic Viruses

    Get PDF
    Commercial anti-virus scanners are generally signature based, that is, they scan for known patterns to determine whether a file is infected by a virus or not. To evade signature-based detection, virus writers have adopted code obfuscation techniques to create highly metamorphic computer viruses. Since metamorphic viruses change their appearance from generation to generation, signature-based scanners cannot detect all instances of such viruses. To combat metamorphic viruses, detection tools based on statistical analysis have been studied. A tool based on hidden Markov models (HMMs) was previously developed and the results are encouraging—it has been shown that metamorphic viruses created by a well-designed metamorphic engine can be detected using an HMM. In this project, we explore whether there are any exploitable weaknesses in this HMM-based detection approach. We create a highly metamorphic virus generating tool designed specifically to evade HMM-based detection. We then test our engine, showing that we can generate viral copies that cannot be detected using previously-developed HMM-based detection techniques. Finally, we consider possible defenses against our approach

    Improved Detection for Advanced Polymorphic Malware

    Get PDF
    Malicious Software (malware) attacks across the internet are increasing at an alarming rate. Cyber-attacks have become increasingly more sophisticated and targeted. These targeted attacks are aimed at compromising networks, stealing personal financial information and removing sensitive data or disrupting operations. Current malware detection approaches work well for previously known signatures. However, malware developers utilize techniques to mutate and change software properties (signatures) to avoid and evade detection. Polymorphic malware is practically undetectable with signature-based defensive technologies. Today’s effective detection rate for polymorphic malware detection ranges from 68.75% to 81.25%. New techniques are needed to improve malware detection rates. Improved detection of polymorphic malware can only be accomplished by extracting features beyond the signature realm. Targeted detection for polymorphic malware must rely upon extracting key features and characteristics for advanced analysis. Traditionally, malware researchers have relied on limited dimensional features such as behavior (dynamic) or source/execution code analysis (static). This study’s focus was to extract and evaluate a limited set of multidimensional topological data in order to improve detection for polymorphic malware. This study used multidimensional analysis (file properties, static and dynamic analysis) with machine learning algorithms to improve malware detection. This research demonstrated improved polymorphic malware detection can be achieved with machine learning. This study conducted a number of experiments using a standard experimental testing protocol. This study utilized three advanced algorithms (Metabagging (MB), Instance Based k-Means (IBk) and Deep Learning Multi-Layer Perceptron) with a limited set of multidimensional data. Experimental results delivered detection results above 99.43%. In addition, the experiments delivered near zero false positives. The study’s approach was based on single case experimental design, a well-accepted protocol for progressive testing. The study constructed a prototype to automate feature extraction, assemble files for analysis, and analyze results through multiple clustering algorithms. The study performed an evaluation of large malware sample datasets to understand effectiveness across a wide range of malware. The study developed an integrated framework which automated feature extraction for multidimensional analysis. The feature extraction framework consisted of four modules: 1) a pre-process module that extracts and generates topological features based on static analysis of machine code and file characteristics, 2) a behavioral analysis module that extracts behavioral characteristics based on file execution (dynamic analysis), 3) an input file construction and submission module, and 4) a machine learning module that employs various advanced algorithms. As with most studies, careful attention was paid to false positive and false negative rates which reduce their overall detection accuracy and effectiveness. This study provided a novel approach to expand the malware body of knowledge and improve the detection for polymorphic malware targeting Microsoft operating systems