863 research outputs found

    Malware detection and analysis via layered annotative execution

    Get PDF
    Malicious software (i.e., malware) has become a severe threat to interconnected computer systems for decades and has caused billions of dollars damages each year. A large volume of new malware samples are discovered daily. Even worse, malware is rapidly evolving to be more sophisticated and evasive to strike against current malware analysis and defense systems. This dissertation takes a root-cause oriented approach to the problem of automatic malware detection and analysis. In this approach, we aim to capture the intrinsic natures of malicious behaviors, rather than the external symptoms of existing attacks. We propose a new architecture for binary code analysis, which is called whole-system out-of-the-box fine-grained dynamic binary analysis, to address the common challenges in malware detection and analysis. to realize this architecture, we build a unified and extensible analysis platform, codenamed TEMU. We propose a core technique for fine-grained dynamic binary analysis, called layered annotative execution, and implement this technique in TEMU. Then on the basis of TEMU, we have proposed and built a series of novel techniques for automatic malware detection and analysis. For postmortem malware analysis, we have developed Renovo, Panorama, HookFinder, and MineSweeper, for detecting and analyzing various aspects of malware. For proactive malware detection, we have built HookScout as a proactive hook detection system. These techniques capture intrinsic characteristics of malware and thus are well suited for dealing with new malware samples and attack mechanisms

    Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent

    Get PDF
    This research determines how appropriate symbolic execution is (given its current implementation) for binary analysis by measuring how much of an executable symbolic execution allows an analyst to reason about. Using the S2E Selective Symbolic Execution Engine with a built-in constraint solver (KLEE), this research measures the effectiveness of S2E on a sample of 27 Debian Linux binaries as compared to a traditional static disassembly tool, IDA Pro. Disassembly code coverage and path exploration is used as a metric for determining success. This research also explores the effectiveness of symbolic execution on packed or obfuscated samples of the same binaries to generate a model-based evaluation of success for techniques commonly employed by malware. Obfuscated results were much higher than expected, which lead to the discovery that S2E was not actually handling the multiple executable memory regions present in unpacker runtime code. Three recommendations are made to address the shortcomings of S2E and allow it to process obfuscated samples correctly

    A compiler level intermediate representation based binary analysis system and its applications

    Get PDF
    Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

    An overview of the ciao multiparadigm language and program development environment and its design philosophy

    Full text link
    We describe some of the novel aspects and motivations behind the design and implementation of the Ciao multiparadigm programming system. An important aspect of Ciao is that it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features can be turned on and off at will for each program module. Thus, a given module may be using e.g. higher order functions and constraints, while another module may be using objects, predicates, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of program optimizations. Such optimizations produce code that is highly competitive with other dynamic languages or, when the highest levéis of optimization are used, even that of static languages, all while retaining the interactive development environment of a dynamic language. The environment also includes a powerful auto-documenter. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in the format of a paper, pointing instead to the existing literature on the system
    • 

    corecore