97 research outputs found

    Statistical Tools for Linking Engine-Generated Malware to Its Engine

    Get PDF
    Malware-generating engines challenge typical malware analysts by requiring them to quickly extract and upload to their customers\u27 machines, a signature for each of a possi- bly vast number of never-before-seen malware instances that an engine can generate in a short amount of time In this thesis we propose and evaluate two methods for\u27linking va- riants of engine-generated malware to its engine. The proposed methods use the w-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware in- stance as a feature vector for the instance. An NFV is a tuple that maps «-grams with their frequencies. The in-formation contained within the NFV of an engine-generated malware instance is then used to attribute the instance to the engine. The first method im- plements a Bayesian-like classifier that uses 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and one of malicious programs from the W 3 2. Simile family of self-mutating mal- ware. The second method, which is an extension of the first method, uses optimized 2-gram frequency vectors as feature vectors and classifies malware by computing its proximity to the average of the NFVs of instances known to have been generated by a known engine. The second method was successfully evaluated on four ma) ware-generating engines: W32 . Simile, W32.Evol, W32.NGCVK, and W32.VCL. The evaluation yielded a set of four 1 7-tuples of doubles as signatures for each of the en- gines, and achieved a 95% discrimination accuracy between a sample of benign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8% were achieved for engine signatures of size 6. 8 and, 14 doubles. We also used four k-rm classifiers which, unlike the second method, require the time-consuming task of creating and storing one signature per known malware instance, to countercheck the ac- curacies achieved by the second method. This work is inspired by successful methods for attributing natural language texts to their respective authors. The proposed methods may be viewed as filtering (or decision support) tools that malware detectors may use to de- termine whether extensive engine-specific program analyses such as emulation and con- trol tlow analysis are needed on a suspect program

    Hidden Markov Models for Malware Classification

    Get PDF
    Malware is a software which is developed for malicious intent. Malware is a rapidly evolving threat to the computing community. Although many techniques for malware classification have been proposed, there is still the lack of a comprehensible and useful taxonomy to classify malware samples. Previous research has shown that hidden Markov model (HMM) analysis is useful for detecting certain types of malware. In this research, we consider the related problem of malware classification based on HMMs. We train HMMs for a variety of malware generators and a variety of compilers. More than 9000 malware samples are then scored against each of these models and the malware samples are separated into clusters based on the resulting scores. We analyze the clusters and show that they correspond to certain characteristics of malware. These results indicate that HMMs are an effective tool for the challenging task of automatically classifying malware

    Binary visualisation for malware detection

    Get PDF
    It is becoming increasingly harder to protect devices against security threats; as malware is steadily evolving defence mechanisms are struggling to persevere. This study introduces a concept intended at supporting security systems using Self-Organizing Incremental Neural Network (SOINN) and binary visualization. The system converts a file to its visual representation and sends the data for classification to SOINN. Tests were done to evaluate its performance and obtain an accuracy rate, which rounds the 80% figures at the moment, and false positive and negative rates. Bytes prevalence were also analysed with malware samples having a higher amount of null bytes compared with software samples, which may be a result of hiding malicious data or functionality. The patterns created by the samples were examined; malware samples had more clustering and created different patterns across the images whereas software samples presented mostly static and constant images although exceptions were noted in both categories

    The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files

    Get PDF
    In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Information Leakage Measurement and Prevention in Anonymous Traffic

    Get PDF
    University of Minnesota Ph.D. dissertation. June 2019. Major: Computer Science. Advisor: Nick Hopper. 1 computer file (PDF); viii, 76 pages.The pervasive Internet surveillance and the wide-deployment of Internet censors lead to the need for making traffic anonymous. However, recent studies demonstrate the information leakage in anonymous traffic that can be used to de-anonymize Internet users. This thesis focuses on how to measure and prevent such information leakage in anonymous traffic. Choosing Tor anonymous networks as the target, the first part of this thesis conducts the first large-scale information leakage measurement in anonymous traffic and discovers that the popular practice of validating WF defenses by accuracy alone is flawed. We make this measurement possible by designing and implementing our website fingerprint density estimation (WeFDE) framework. The second part of this thesis focuses on preventing such information leakage. Specifically, we design two anti-censorship systems which are able to survive traffic analysis and provide unblocked online video watching and social networking

    Intensional Cyberforensics

    Get PDF
    This work focuses on the application of intensional logic to cyberforensic analysis and its benefits and difficulties are compared with the finite-state-automata approach. This work extends the use of the intensional programming paradigm to the modeling and implementation of a cyberforensics investigation process with backtracing of event reconstruction, in which evidence is modeled by multidimensional hierarchical contexts, and proofs or disproofs of claims are undertaken in an eductive manner of evaluation. This approach is a practical, context-aware improvement over the finite state automata (FSA) approach we have seen in previous work. As a base implementation language model, we use in this approach a new dialect of the Lucid programming language, called Forensic Lucid, and we focus on defining hierarchical contexts based on intensional logic for the distributed evaluation of cyberforensic expressions. We also augment the work with credibility factors surrounding digital evidence and witness accounts, which have not been previously modeled. The Forensic Lucid programming language, used for this intensional cyberforensic analysis, formally presented through its syntax and operational semantics. In large part, the language is based on its predecessor and codecessor Lucid dialects, such as GIPL, Indexical Lucid, Lucx, Objective Lucid, and JOOIP bound by the underlying intensional programming paradigm.Comment: 412 pages, 94 figures, 18 tables, 19 algorithms and listings; PhD thesis; v2 corrects some typos and refs; also available on Spectrum at http://spectrum.library.concordia.ca/977460
    corecore