97 research outputs found
Statistical Tools for Linking Engine-Generated Malware to Its Engine
Malware-generating engines challenge typical malware analysts by requiring them to quickly extract and upload to their customers\u27 machines, a signature for each of a possi- bly vast number of never-before-seen malware instances that an engine can generate in a short amount of time In this thesis we propose and evaluate two methods for\u27linking va- riants of engine-generated malware to its engine. The proposed methods use the w-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware in- stance as a feature vector for the instance. An NFV is a tuple that maps «-grams with their frequencies. The in-formation contained within the NFV of an engine-generated malware instance is then used to attribute the instance to the engine. The first method im- plements a Bayesian-like classifier that uses 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and one of malicious programs from the W 3 2. Simile family of self-mutating mal- ware. The second method, which is an extension of the first method, uses optimized 2-gram frequency vectors as feature vectors and classifies malware by computing its proximity to the average of the NFVs of instances known to have been generated by a known engine. The second method was successfully evaluated on four ma) ware-generating engines: W32 . Simile, W32.Evol, W32.NGCVK, and W32.VCL. The evaluation yielded a set of four 1 7-tuples of doubles as signatures for each of the en- gines, and achieved a 95% discrimination accuracy between a sample of benign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8% were achieved for engine signatures of size 6. 8 and, 14 doubles. We also used four k-rm classifiers which, unlike the second method, require the time-consuming task of creating and storing one signature per known malware instance, to countercheck the ac- curacies achieved by the second method. This work is inspired by successful methods for attributing natural language texts to their respective authors. The proposed methods may be viewed as filtering (or decision support) tools that malware detectors may use to de- termine whether extensive engine-specific program analyses such as emulation and con- trol tlow analysis are needed on a suspect program
Hidden Markov Models for Malware Classification
Malware is a software which is developed for malicious intent. Malware is a rapidly evolving threat to the computing community. Although many techniques for malware classification have been proposed, there is still the lack of a comprehensible and useful taxonomy to classify malware samples. Previous research has shown that hidden Markov model (HMM) analysis is useful for detecting certain types of malware. In this research, we consider the related problem of malware classification based on HMMs. We train HMMs for a variety of malware generators and a variety of compilers. More than 9000 malware samples are then scored against each of these models and the malware samples are separated into clusters based on the resulting scores. We analyze the clusters and show that they correspond to certain characteristics of malware. These results indicate that HMMs are an effective tool for the challenging task of automatically classifying malware
Binary visualisation for malware detection
It is becoming increasingly harder to protect devices against security threats; as malware is steadily evolving defence mechanisms are struggling to persevere. This study introduces a concept intended at supporting security systems using Self-Organizing Incremental Neural Network (SOINN) and binary visualization. The system converts a file to its visual representation and sends the data for classification to SOINN. Tests were done to evaluate its performance and obtain an accuracy rate, which rounds the 80% figures at the moment, and false positive and negative rates. Bytes prevalence were also analysed with malware samples having a higher amount of null bytes compared with software samples, which may be a result of hiding malicious data or functionality. The patterns created by the samples were examined; malware samples had more clustering and created different patterns across the images whereas software samples presented mostly static and constant images although exceptions were noted in both categories
The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files
In many forensic investigations, questions linger regarding the identity of the authors of the software specimen. Research has identified methods for the attribution of binary files that have not been obfuscated, but a significant percentage of malicious software has been obfuscated in an effort to hide both the details of its origin and its true intent. Little research has been done around analyzing obfuscated code for attribution. In part, the reason for this gap in the research is that deobfuscation of an unknown program is a challenging task. Further, the additional transformation of the executable file introduced by the obfuscator modifies or removes features from the original executable that would have been used in the author attribution process. Existing research has demonstrated good success in attributing the authorship of an executable file of unknown provenance using methods based on static analysis of the specimen file. With the addition of file obfuscation, static analysis of files becomes difficult, time consuming, and in some cases, may lead to inaccurate findings. This paper presents a novel process for authorship attribution using dynamic analysis methods. A software emulated system was fully instrumented to become a test harness for a specimen of unknown provenance, allowing for supervised control, monitoring, and trace data collection during execution. This trace data was used as input into a supervised machine learning algorithm trained to identify stylometric differences in the specimen under test and provide predictions on who wrote the specimen. The specimen files were also analyzed for authorship using static analysis methods to compare prediction accuracies with prediction accuracies gathered from this new, dynamic analysis based method. Experiments indicate that this new method can provide better accuracy of author attribution for files of unknown provenance, especially in the case where the specimen file has been obfuscated
Handbook of Digital Face Manipulation and Detection
This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area
Handbook of Digital Face Manipulation and Detection
This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area
Information Leakage Measurement and Prevention in Anonymous Traffic
University of Minnesota Ph.D. dissertation. June 2019. Major: Computer Science. Advisor: Nick Hopper. 1 computer file (PDF); viii, 76 pages.The pervasive Internet surveillance and the wide-deployment of Internet censors lead to the need for making traffic anonymous. However, recent studies demonstrate the information leakage in anonymous traffic that can be used to de-anonymize Internet users. This thesis focuses on how to measure and prevent such information leakage in anonymous traffic. Choosing Tor anonymous networks as the target, the first part of this thesis conducts the first large-scale information leakage measurement in anonymous traffic and discovers that the popular practice of validating WF defenses by accuracy alone is flawed. We make this measurement possible by designing and implementing our website fingerprint density estimation (WeFDE) framework. The second part of this thesis focuses on preventing such information leakage. Specifically, we design two anti-censorship systems which are able to survive traffic analysis and provide unblocked online video watching and social networking
Intensional Cyberforensics
This work focuses on the application of intensional logic to cyberforensic
analysis and its benefits and difficulties are compared with the
finite-state-automata approach. This work extends the use of the intensional
programming paradigm to the modeling and implementation of a cyberforensics
investigation process with backtracing of event reconstruction, in which
evidence is modeled by multidimensional hierarchical contexts, and proofs or
disproofs of claims are undertaken in an eductive manner of evaluation. This
approach is a practical, context-aware improvement over the finite state
automata (FSA) approach we have seen in previous work. As a base implementation
language model, we use in this approach a new dialect of the Lucid programming
language, called Forensic Lucid, and we focus on defining hierarchical contexts
based on intensional logic for the distributed evaluation of cyberforensic
expressions. We also augment the work with credibility factors surrounding
digital evidence and witness accounts, which have not been previously modeled.
The Forensic Lucid programming language, used for this intensional
cyberforensic analysis, formally presented through its syntax and operational
semantics. In large part, the language is based on its predecessor and
codecessor Lucid dialects, such as GIPL, Indexical Lucid, Lucx, Objective
Lucid, and JOOIP bound by the underlying intensional programming paradigm.Comment: 412 pages, 94 figures, 18 tables, 19 algorithms and listings; PhD
thesis; v2 corrects some typos and refs; also available on Spectrum at
http://spectrum.library.concordia.ca/977460
- …