88 research outputs found

    A compiler level intermediate representation based binary analysis system and its applications

    Get PDF
    Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

    Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis

    Get PDF
    Real-time systems have to complete their actions w.r.t. given timing constraints. In order to validate that these constraints are met, static timing analysis is usually performed to compute an upper bound of the worst-case execution times (WCET) of all the involved tasks. This thesis identifies the requirements of real-time system analysis on the control flow graph that the static analyses work on. A novel approach is presented that extracts a control flow graph from binary executables, which are typically used when performing WCET analysis of real-time systems. Timing analysis can be split into two steps: a) the analysis of the behaviour of the hardware components, b) finding the worst-case path. A novel approach to path analysis is described in this thesis that introduces sophisticated interprocedural analysis techniques that were not available before.Echtzeitsysteme mĂŒssen ihre Aufgaben innerhalb vorgegebener Zeitschranken abwickeln. Um die Einhaltung der Zeitschranken zu ĂŒberprĂŒfen, sind fĂŒr gewöhnlich statische Analysen der schlimmsten AusfĂŒhrzeiten der Teilprogramme des Echtzeitsystems nötig. Diese Arbeit stellt die Anforderungen von Echtzeitsystem an den Kontrollflussgraphen vor, auf dem die statischen Analysen arbeiten. Ein neuartiger Ansatz zur RĂŒckberechnung von Kontrollflußgraphen aus Maschinenprogrammen, die hĂ€ufig die Grundlage der WCET-Analyse von Echtzeitsystemen bilden, wird vorgestellt. WCET-Analysen können in zwei Teile zerlegt werden: a) die Analyse des Verhaltens der Hardwarebausteine, b) die Suche nach dem schlimmsten AusfĂŒhrpfad. In dieser Arbeit wird ein neuartiger Ansatz der Pfadanalyse vorgestellt, der fĂŒr ausgefeilte interprozedurale Analysemethoden ausgelegt ist, die vorher hier nicht verfĂŒgbar waren

    Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis

    Get PDF
    Real-time systems have to complete their actions w.r.t. given timing constraints. In order to validate that these constraints are met, static timing analysis is usually performed to compute an upper bound of the worst-case execution times (WCET) of all the involved tasks. This thesis identifies the requirements of real-time system analysis on the control flow graph that the static analyses work on. A novel approach is presented that extracts a control flow graph from binary executables, which are typically used when performing WCET analysis of real-time systems. Timing analysis can be split into two steps: a) the analysis of the behaviour of the hardware components, b) finding the worst-case path. A novel approach to path analysis is described in this thesis that introduces sophisticated interprocedural analysis techniques that were not available before.Echtzeitsysteme mĂŒssen ihre Aufgaben innerhalb vorgegebener Zeitschranken abwickeln. Um die Einhaltung der Zeitschranken zu ĂŒberprĂŒfen, sind fĂŒr gewöhnlich statische Analysen der schlimmsten AusfĂŒhrzeiten der Teilprogramme des Echtzeitsystems nötig. Diese Arbeit stellt die Anforderungen von Echtzeitsystem an den Kontrollflussgraphen vor, auf dem die statischen Analysen arbeiten. Ein neuartiger Ansatz zur RĂŒckberechnung von Kontrollflußgraphen aus Maschinenprogrammen, die hĂ€ufig die Grundlage der WCET-Analyse von Echtzeitsystemen bilden, wird vorgestellt. WCET-Analysen können in zwei Teile zerlegt werden: a) die Analyse des Verhaltens der Hardwarebausteine, b) die Suche nach dem schlimmsten AusfĂŒhrpfad. In dieser Arbeit wird ein neuartiger Ansatz der Pfadanalyse vorgestellt, der fĂŒr ausgefeilte interprozedurale Analysemethoden ausgelegt ist, die vorher hier nicht verfĂŒgbar waren

    Program Semantics in Model-Based WCET Analysis: A State of the Art Perspective

    Get PDF
    Advanced design techniques of safety-critical applications use specialized development model based methods. Under this setting, the application exists at several levels of description, as the result of a sequence of transformations. On the positive side, the application is developed in a systematic way, while on the negative side, its high-level semantics may be obfuscated when represented at the lower levels. The application should provide certain functional and non-functional guarantees. When the application is a hard real-time program, such guarantees could be deadlines, thus making the computation of worst-case execution time (WCET) bounds mandatory. This paper overviews, in the context of WCET analysis, what are the existing techniques to extract, express and exploit the program semantics along the model-based development workflow

    Source Code Analysis: A Road Map

    Full text link

    Efficient and extensible security enforcement using dynamic data flow analysis

    Full text link

    Pre/post conditioned slicing

    Get PDF
    Th paper shows how analysis of programs in terms of pre- and postconditions can be improved using a generalisation of conditioned program slicing called pre/post conditioned slicing. Such conditions play an important role in program comprehension, reuse, verification and reengineering. Fully automated analysis is impossible because of the inherent undecidability of pre- and post- conditions. The method presented reformulates the problem to circumvent this. The reformulation is constructed so that programs which respect the pre- and post-conditions applied to them have empty slices. For those which do not respect the conditions, the slice contains statements which could potentially break the conditions. This separates the automatable part of the analysis from the human analysis
    • 

    corecore