1,636 research outputs found

    Multi-aspect, robust, and memory exclusive guest os fingerprinting

    Get PDF
    Precise fingerprinting of an operating system (OS) is critical to many security and forensics applications in the cloud, such as virtual machine (VM) introspection, penetration testing, guest OS administration, kernel dump analysis, and memory forensics. The existing OS fingerprinting techniques primarily inspect network packets or CPU states, and they all fall short in precision and usability. As the physical memory of a VM always exists in all these applications, in this article, we present OS-Sommelier+, a multi-aspect, memory exclusive approach for precise and robust guest OS fingerprinting in the cloud. It works as follows: given a physical memory dump of a guest OS, OS-Sommelier+ first uses a code hash based approach from kernel code aspect to determine the guest OS version. If code hash approach fails, OS-Sommelier+ then uses a kernel data signature based approach from kernel data aspect to determine the version. We have implemented a prototype system, and tested it with a number of Linux kernels. Our evaluation results show that the code hash approach is faster but can only fingerprint the known kernels, and data signature approach complements the code signature approach and can fingerprint even unknown kernels

    THE SCALABLE AND ACCOUNTABLE BINARY CODE SEARCH AND ITS APPLICATIONS

    Get PDF
    The past decade has been witnessing an explosion of various applications and devices. This big-data era challenges the existing security technologies: new analysis techniques should be scalable to handle “big data” scale codebase; They should be become smart and proactive by using the data to understand what the vulnerable points are and where they locate; effective protection will be provided for dissemination and analysis of the data involving sensitive information on an unprecedented scale. In this dissertation, I argue that the code search techniques can boost existing security analysis techniques (vulnerability identification and memory analysis) in terms of scalability and accuracy. In order to demonstrate its benefits, I address two issues of code search by using the code analysis: scalability and accountability. I further demonstrate the benefit of code search by applying it for the scalable vulnerability identification [57] and the cross-version memory analysis problems [55, 56]. Firstly, I address the scalability problem of code search by learning “higher-level” semantic features from code [57]. Instead of conducting fine-grained testing on a single device or program, it becomes much more crucial to achieve the quick vulnerability scanning in devices or programs at a “big data” scale. However, discovering vulnerabilities in “big code” is like finding a needle in the haystack, even when dealing with known vulnerabilities. This new challenge demands a scalable code search approach. To this end, I leverage successful techniques from the image search in computer vision community and propose a novel code encoding method for scalable vulnerability search in binary code. The evaluation results show that this approach can achieve comparable or even better accuracy and efficiency than the baseline techniques. Secondly, I tackle the accountability issues left in the vulnerability searching problem by designing vulnerability-oriented raw features [58]. The similar code does not always represent the similar vulnerability, so it requires that the feature engineering for the code search should focus on semantic level features rather than syntactic ones. I propose to extract conditional formulas as higher-level semantic features from the raw binary code to conduct the code search. A conditional formula explicitly captures two cardinal factors of a vulnerability: 1) erroneous data dependencies and 2) missing or invalid condition checks. As a result, the binary code search on conditional formulas produces significantly higher accuracy and provides meaningful evidence for human analysts to further examine the search results. The evaluation results show that this approach can further improve the search accuracy of existing bug search techniques with very reasonable performance overhead. Finally, I demonstrate the potential of the code search technique in the memory analysis field, and apply it to address their across-version issue in the memory forensic problem [55, 56]. The memory analysis techniques for COTS software usually rely on the so-called “data structure profiles” for their binaries. Construction of such profiles requires the expert knowledge about the internal working of a specified software version. However, it is still a cumbersome manual effort most of time. I propose to leverage the code search technique to enable a notion named “cross-version memory analysis”, which can update a profile for new versions of a software by transferring the knowledge from the model that has already been trained on its old version. The evaluation results show that the code search based approach advances the existing memory analysis methods by reducing the manual efforts while maintaining the reasonable accuracy. With the help of collaborators, I further developed two plugins to the Volatility memory forensic framework [2], and show that each of the two plugins can construct a localized profile to perform specified memory forensic tasks on the same memory dump, without the need of manual effort in creating the corresponding profile

    A formally verified compiler back-end

    Get PDF
    This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well

    Malware detection and analysis via layered annotative execution

    Get PDF
    Malicious software (i.e., malware) has become a severe threat to interconnected computer systems for decades and has caused billions of dollars damages each year. A large volume of new malware samples are discovered daily. Even worse, malware is rapidly evolving to be more sophisticated and evasive to strike against current malware analysis and defense systems. This dissertation takes a root-cause oriented approach to the problem of automatic malware detection and analysis. In this approach, we aim to capture the intrinsic natures of malicious behaviors, rather than the external symptoms of existing attacks. We propose a new architecture for binary code analysis, which is called whole-system out-of-the-box fine-grained dynamic binary analysis, to address the common challenges in malware detection and analysis. to realize this architecture, we build a unified and extensible analysis platform, codenamed TEMU. We propose a core technique for fine-grained dynamic binary analysis, called layered annotative execution, and implement this technique in TEMU. Then on the basis of TEMU, we have proposed and built a series of novel techniques for automatic malware detection and analysis. For postmortem malware analysis, we have developed Renovo, Panorama, HookFinder, and MineSweeper, for detecting and analyzing various aspects of malware. For proactive malware detection, we have built HookScout as a proactive hook detection system. These techniques capture intrinsic characteristics of malware and thus are well suited for dealing with new malware samples and attack mechanisms

    Convicted by memory: Automatically recovering spatial-temporal evidence from memory images

    Get PDF
    Memory forensics can reveal “up to the minute” evidence of a device’s usage, often without requiring a suspect’s password to unlock the device, and it is oblivious to any persistent storage encryption schemes, e.g., whole disk encryption. Prior to my work, researchers and investigators alike considered data-structure recovery the ultimate goal of memory image forensics. This, however, was far from sufficient, as investigators were still largely unable to understand the content of the recovered evidence, and hence efficiently locating and accurately analyzing such evidence locked in memory images remained an open research challenge. In this dissertation, I propose breaking from traditional data-recovery-oriented forensics, and instead I present a memory forensics framework which leverages program analysis to automatically recover spatial-temporal evidence from memory images by understanding the programs that generated it. This framework consists of four techniques, each of which builds upon the discoveries of the previous, that represent this new paradigm of program-analysis-driven memory forensics. First, I present DSCRETE, a technique which reuses a program’s own interpretation and rendering logic to recover and present in-memory data structure contents. Following that, VCR developed vendor-generic data structure identification for the recovery of in-memory photographic evidence produced by an Android device’s cameras. GUITAR then realized an app-independent technique which automatically reassembles and redraws an app’s GUI from the multitude of GUI data elements found in a smartphone’s memory image. Finally, different from any traditional memory forensics technique, RetroScope introduced the vision of spatial-temporal memory forensics by retargeting an Android app’s execution to recover sequences of previous GUI screens, in their original temporal order, from a memory image. This framework, and the new program analysis techniques which enable it, have introduced encryption-oblivious forensics capabilities far exceeding traditional data-structure recovery

    An Efficient Platform for the Automatic Extraction of Patterns in Native Code

    Get PDF
    Different software tools, such as decompilers, code quality analyzers, recognizers of packed executable files, authorship analyzers, and malware detectors, search for patterns in binary code. The use of machine learning algorithms, trained with programs taken from the huge number of applications in the existing open source code repositories, allows finding patterns not detected with the manual approach. To this end, we have created a versatile platform for the automatic extraction of patterns from native code, capable of processing big binary files. Its implementation has been parallelized, providing important runtime performance benefits for multicore architectures. Compared to the single-processor execution, the average performance improvement obtained with the best configuration is 3.5 factors over the maximum theoretical gain of 4 factors

    The Java system dependence graph

    Get PDF
    The Program Dependence Graph was introduced by Ottenstein and Ottenstein in 1984 [14]. It was suggested to be a suitable internal program representation for monolithic programs, for the purpose of carrying out certain software engineering operations such as slicing and the computation of program metrics. Since then, Horwitz et al. have introduced the multi-procedural equivalent System Dependence Graph [9]. Many authors have proposed object-oriented dependence graph construction approaches [11, 10, 20, 12]. Every approach provides its own benefits, some of which are language specific. This paper is based on Java and combines the most important benefits from a range of approaches. The result is a Java System Dependence Graph, which summarises the key benefits offered by different approaches and adapts them (if necessary) to the Java language

    Forensic Memory Classification using Deep Recurrent Neural Networks

    Get PDF
    The goal of this project is to advance the application of machine learning frameworks and tools in the process of malware detection. Specifically, a deep neural network architecture is proposed to classify application modules as benign or malicious, using the lower level memory block patterns that make up these modules. The modules correspond to blocks of functionality within files used in kernel and OS level processes as well as user level applications. The learned model is proposed to reside in an isolated core with strict communication restrictions to achieve incorruptibility as well as efficiency, therefore providing a probabilistic memory-level view of the system that is consistent with the user-level view. The lower level memory blocks are constructed using basic block sequences of varying sizes that are fed as input into Long-Short Term Memory models. Four configurations of the LSTM model are explored, by adding bi-directionality as well as Attention. Assembly level data from 50 PE files are extracted and basic blocks are constructed using the IDA Disassembler toolkit. The results show that longer basic block sequences result in richer LSTM hidden layer representations. The hidden states are fed as features into Max pooling layers or Attention layers, depending on the configuration being tested, and the final classification is performed using Logistic Regression with a single hidden layer. The bidirectional LSTM with Attention proved to be the best model, used on basic block sequences of size 29. The differences between the model’s ROC curves indicate a strong reliance on lower level, instructional features, as opposed to metadata or String features, that speak to the success of using entire assembly instructions as data, as opposed to just opcodes or higher level features
    corecore