99 research outputs found

    Achieving Obfuscation Through Self-Modifying Code: A Theoretical Model

    Get PDF
    With the extreme amount of data and software available on networks, the protection of online information is one of the most important tasks of this technological age. There is no such thing as safe computing, and it is inevitable that security breaches will occur. Thus, security professionals and practices focus on two areas: security, preventing a breach from occurring, and resiliency, minimizing the damages once a breach has occurred. One of the most important practices for adding resiliency to source code is through obfuscation, a method of re-writing the code to a form that is virtually unreadable. This makes the code incredibly hard to decipher by attackers, protecting intellectual property and reducing the amount of information gained by the malicious actor. Achieving obfuscation through the use of self-modifying code, code that mutates during runtime, is a complicated but impressive undertaking that creates an incredibly robust obfuscating system. While there is a great amount of research that is still ongoing, the preliminary results of this subject suggest that the application of self-modifying code to obfuscation may yield self-maintaining software capable of healing itself following an attack

    Application of Artificial Intelligence for Detecting Derived Viruses

    Get PDF
    Computer viruses have become complex and operates in a stealth mode to avoid detection. New viruses are argued to be created each and every day. However, most of these supposedly ‘new’ viruses are not completely new. Most of the supposedly ‘new’ viruses are not necessarily created from scratch with completely new (something novel that has never been seen before) mechanisms. For example, most of these viruses just change their form and signatures to avoid detection. But their operation and the way they infect files and systems is still the same. Hence, such viruses cannot be argued to be new. In this paper, the authors refer to such viruses as derived viruses. Just like new viruses, derived viruses are hard to detect with current scanning-detection methods. Therefore, this paper proposes a virus detection system that detects derived viruses better than existing methods. The proposed system integrates a mutating engine together with neural network to improve the detection rate of derived viruses. Experimental results show that the proposed model can detect derived viruses with an average accuracy detection rate of 80% (this include 91% success rate on first generation, 83% success rate on second generation and 65% success rate on third generation). The results further shows that the correlation between the original virus signature and its derivatives decreases further down along its generations

    Normalizer: Augmenting Code Clone Detectors using Source Code Normalization

    Get PDF
    Code clones are duplicate fragments of code that perform the same task. As software code bases increase in size, the number of code clones also tends to increase. These code clones, possibly created through copy-and-paste methods or unintentional duplication of effort, increase maintenance cost over the lifespan of the software. Code clone detection tools exist to identify clones where a human search would prove unfeasible, however the quality of the clones found may vary. I demonstrate that the performance of such tools can be improved by normalizing the source code before usage. I developed Normalizer, a tool to transform C source code to normalized source code where the code is written as consistently as possible. By maintaining the code\u27s function while enforcing a strict format, the variability of the programmer\u27s style will be taken out. Thus, code clones may be easier to detect by tools regardless of how it was written. Reordering statements, removing useless code, and renaming identifiers are used to achieve normalized code. Normalizer was used to show that more clones can be found in Introduction to Computer Networks assignments by normalizing the source code versus the original source code using a small variety of code clone detection tools

    Unveiling metamorphism by abstract interpretation of code properties

    Get PDF
    Metamorphic code includes self-modifying semantics-preserving transformations to exploit code diversification. The impact of metamorphism is growing in security and code protection technologies, both for preventing malicious host attacks, e.g., in software diversification for IP and integrity protection, and in malicious software attacks, e.g., in metamorphic malware self-modifying their own code in order to foil detection systems based on signature matching. In this paper we consider the problem of automatically extracting metamorphic signatures from metamorphic code. We introduce a semantics for self-modifying code, later called phase semantics, and prove its correctness by showing that it is an abstract interpretation of the standard trace semantics. Phase semantics precisely models the metamorphic code behavior by providing a set of traces of programs which correspond to the possible evolutions of the metamorphic code during execution. We show that metamorphic signatures can be automatically extracted by abstract interpretation of the phase semantics. In particular, we introduce the notion of regular metamorphism, where the invariants of the phase semantics can be modeled as finite state automata representing the code structure of all possible metamorphic change of a metamorphic code, and we provide a static signature extraction algorithm for metamorphic code where metamorphic signatures are approximated in regular metamorphism

    Malware Classification based on Call Graph Clustering

    Full text link
    Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DBSCAN. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding Agency for Technology and Innovation as part of its ICT SHOK Future Internet research programme, grant 40212/0

    A new approach to malware detection

    Get PDF
    Malware is a type of malicious programs, and is one of the most common and serious types of attacks on the Internet. Obfuscating transformations have been widely applied by attackers to malware, which makes malware detection become a more challenging issue. There has been extensive research to detect obfuscated malware. A promising research direction uses both control-flow graph and instruction classes of basic blocks as the signature of malware. This research direction is robust against certain obfuscation, such as variable substitution, instruction reordering. But only using instruction classes to detect obfuscated basic blocks will cause high false positives and false negatives. In this thesis, based on the same research direction, we proposed an improved approach to detect obfuscated malware. In addition to using CFG, our approach also uses functionalities of basic block as the signature of malware. Specifically, our contributions are presented as follows: 1) we design "signature calculation algorithm" to extract the signature of a malicious code fragment. "Signature calculation algorithm" is based on compiler optimization algorithm, but add and integrate memory sub-variable optimization, expression formalization and cross basic block propagation into it. 2) we formalize the expressions of assignment statements to facilitate comparing the functionalities of two expressions. 3) we design a detection algorithm to detect whether a program is an obfuscated malware instance. Our detection algorithm compares two aspects: CFG and the functionalities of basic blocks. 4) we implement the proposed approach, and perform experiments to compare our approach and the previous approach

    Statistical Tools for Linking Engine-Generated Malware to Its Engine

    Get PDF
    Malware-generating engines challenge typical malware analysts by requiring them to quickly extract and upload to their customers\u27 machines, a signature for each of a possi- bly vast number of never-before-seen malware instances that an engine can generate in a short amount of time In this thesis we propose and evaluate two methods for\u27linking va- riants of engine-generated malware to its engine. The proposed methods use the w-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware in- stance as a feature vector for the instance. An NFV is a tuple that maps «-grams with their frequencies. The in-formation contained within the NFV of an engine-generated malware instance is then used to attribute the instance to the engine. The first method im- plements a Bayesian-like classifier that uses 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and one of malicious programs from the W 3 2. Simile family of self-mutating mal- ware. The second method, which is an extension of the first method, uses optimized 2-gram frequency vectors as feature vectors and classifies malware by computing its proximity to the average of the NFVs of instances known to have been generated by a known engine. The second method was successfully evaluated on four ma) ware-generating engines: W32 . Simile, W32.Evol, W32.NGCVK, and W32.VCL. The evaluation yielded a set of four 1 7-tuples of doubles as signatures for each of the en- gines, and achieved a 95% discrimination accuracy between a sample of benign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8% were achieved for engine signatures of size 6. 8 and, 14 doubles. We also used four k-rm classifiers which, unlike the second method, require the time-consuming task of creating and storing one signature per known malware instance, to countercheck the ac- curacies achieved by the second method. This work is inspired by successful methods for attributing natural language texts to their respective authors. The proposed methods may be viewed as filtering (or decision support) tools that malware detectors may use to de- termine whether extensive engine-specific program analyses such as emulation and con- trol tlow analysis are needed on a suspect program

    Morphing engines classification by code histogram

    Get PDF
    Morphing engines or mutation engines are exploited by metamorphic virus to change the code appearance in every new generation. The purpose of these engines is to escape from the signature-based scanner, which employs a unique string signature to detect the virus. Although the obfuscation techniques try to convert the binary sequence of the code, in some techniques, the statistical feature of the code binaries will be still remain unchanged, relatively. Accordingly, this feature can be utilized to classify the engine and detect the morphed virus code. In this article, we are going to introduce a new idea to classify the obfuscation engines based on their code statistical feature using the histogram comparison
    corecore