28 research outputs found

    Generating analyzers with PAG

    Get PDF
    To produce high qualitiy code, modern compilers use global optimization algorithms based on it abstract interpretation. These algorithms are rather complex; their implementation is therfore a non-trivial task and error-prone. However, since thez are based on a common theory, they have large similar parts. We conclude that analyzer writing better should be replaced with analyzer generation. We present the tool sf PAG that has a high level functional input language to specify data flow analyses. It offers th specifications of even recursive data structures and is therfore not limited to bit vector problems. sf PAG generates efficient analyzers wich can be easily integrated in existing compilers. The analyzers are interprocedural, they can handle recursive procedures with local variables and higher order functions. sf PAG has successfully been tested by generating several analyzers (e.g. alias analysis, constant propagation, inerval analysis) for an industrial quality ANSI-C and Fortran90 compiler. This technical report consits of two parts; the first introduces the generation system and the second evaluates generated analyzers with respect to their space and time consumption. bf Keywords: data flow analysis, specification and generation of analyzers, lattice specification, abstract syntax specification, interprocedural analysis, compiler construction

    Generating analyzers with PAG

    Get PDF
    To produce high qualitiy code, modern compilers use global optimization algorithms based on it abstract interpretation. These algorithms are rather complex; their implementation is therfore a non-trivial task and error-prone. However, since thez are based on a common theory, they have large similar parts. We conclude that analyzer writing better should be replaced with analyzer generation. We present the tool sf PAG that has a high level functional input language to specify data flow analyses. It offers th specifications of even recursive data structures and is therfore not limited to bit vector problems. sf PAG generates efficient analyzers wich can be easily integrated in existing compilers. The analyzers are interprocedural, they can handle recursive procedures with local variables and higher order functions. sf PAG has successfully been tested by generating several analyzers (e.g. alias analysis, constant propagation, inerval analysis) for an industrial quality ANSI-C and Fortran90 compiler. This technical report consits of two parts; the first introduces the generation system and the second evaluates generated analyzers with respect to their space and time consumption. bf Keywords: data flow analysis, specification and generation of analyzers, lattice specification, abstract syntax specification, interprocedural analysis, compiler construction

    Generating program analyzers

    Get PDF
    In this work the automatic generation of program analyzers from concise specifications is presented. It focuses on provably correct and complex interprocedural analyses for real world sized imperative programs. Thus, a powerful and flexible specification mechanism is required, enabling both correctness proofs and efficient implementations. The generation process relies on the theory of data flow analysis and on abstract interpretation. The theory of data flow analysis provides methods to efficiently implement analyses. Abstract interpretation provides the relation to the semantics of the programming language. This allows the systematic derivation of efficient provably correct, and terminating analyses. The approach has been implemented in the program analyzer generator PAG. It addresses analyses ranging from "simple\u27; intraprocedural bit vector frameworks to complex interprocedural alias analyses. A high level specialized functional language is used as specification mechanism enabling elegant and concise specifications even for complex analyses. Additionally, it allows the automatic selection of efficient implementations for the underlying abstract datatypes, such as balanced binary trees, binary decision diagrams, bit vectors, and arrays. For the interprocedural analysis the functional approach, the call string approach, and a novel approach especially targeting on the precise analysis of loops can be chosen. In this work the implementation of PAG as well as a large number of applications of PAG are presented.Diese Arbeit befaßt sich mit der automatischen Generierung von Programmanalysatoren aus prĂ€gnanten Spezifikationen. Dabei wird besonderer Wert auf die Generierung von beweisbar korrekten und komplexen interprozeduralen Analysen fĂŒr imperative Programme realer GrĂ¶ĂŸe gelegt. Um dies zu erreichen, ist ein leistungsfĂ€higer und flexibler Spezifikationsmechanismus erforderlich, der sowohl Korrektheitsbeweise, als auch effiziente Implementierungen ermöglicht. Die Generierung basiert auf den Theorien der Datenflußanalyse und der abstrakten Interpretation. Die Datenflußanalyse liefert Methoden zur effizienten Implementierung von Analysen. Die abstrakte Interpretation stellt den Bezug zur Semantik der Programmiersprache her und ermöglicht dadurch die systematische Ableitung beweisbar korrekter und terminierender Analysen. Dieser Ansatz wurde im Programmanalysatorgenerator PAG implementiert, der sowohl fĂŒr einfache intraprozedurale Bitvektor- Analysen, als auch fĂŒr komplexe interprozedurale Alias-Analysen geeignet ist. Als Spezifikationsmechanismus wird dabei eine spezialisierte funktionale Sprache verwendet, die es ermöglicht, auch komplexe Analysen kurz und prĂ€gnant zu spezifizieren. DarĂŒberhinaus ist es möglich, fĂŒr die zugrunde liegenden abstrakten Bereiche automatisch effiziente Implementierungen auszuwĂ€hlen, z.B. balancierte binĂ€re BĂ€ume, Binary Decision Diagrams, Bitvektoren oder Felder. FĂŒr die interprozedurale Analyse stehen folgende Möglichkeiten zur Auswahl: der funktionale Ansatz, der Call-String-Ansatz und ein neuer Ansatz, der besonders auf die prĂ€zise Analyse von Schleifen abzielt. Diese Arbeit beschreibt sowohl die Implementierung von PAG, als auch eine große Anzahl von Anwendungen

    Man-machine partial program analysis for malware detection

    Get PDF
    With the meteoric rise in popularity of the Android platform, there is an urgent need to combat the accompanying proliferation of malware. Existing work addresses the area of consumer malware detection, but cannot detect novel, sophisticated, domain-specific malware that is targeted specifically at one aspect of an organization (eg. ground operations of the US Military). Adversaries can exploit domain knowledge to camoflauge malice within the legitimate behaviors of an app and behind a domain-specific trigger, rendering traditional approaches such as signature-matching, machine learning, and dynamic monitoring ineffective. Manual code inspections are also inadequate, scaling poorly and introducing human error. Yet, there is a dire need to detect this kind of malware before it causes catastrophic loss of life and property. This dissertation presents the Security Toolbox, our novel solution for this challenging new problem posed by DARPA\u27s Automated Program Analysis for Cybersecurity (APAC) program. We employ a human-in-the-loop approach to amplify the natural intelligence of our analysts. Our automation detects interesting program behaviors and exposes them in an analysis Dashboard, allowing the analyst to brainstorm flaw hypotheses and ask new questions, which in turn can be answered by our automated analysis primitives. The Security Toolbox is built on top of Atlas, a novel program analysis platform made by EnSoft. Atlas uses a graph-based mathematical abstraction of software to produce a unified property multigraph, exposes a powerful API for writing analyzers using graph traversals, and provides both automated and interactive capabilities to facilitate program comprehension. The Security Toolbox is also powered by FlowMiner, a novel solution to mine fine-grained, compact data flow summaries of Java libraries. FlowMiner allows the Security Toolbox to complete a scalable and accurate partial program analysis of an application without including all of the libraries that it uses (eg. Android). This dissertation presents the Security Toolbox, Atlas, and FlowMiner. We provide empirical evidence of the effectiveness of the Security Toolbox for detecting novel, sophisticated, domain-specific Android malware, demonstrating that our approach outperforms other cutting-edge research tools and state-of-the-art commercial programs in both time and accuracy metrics. We also evaluate the effectiveness of Atlas as a program analysis platform and FlowMiner as a library summary tool

    Applying compiler technology to solve generic

    Get PDF
    Compilers are tools that transform a high level programming languages into assem- bly or binary code. The essential of the process is done by the interpretation and the code generation steps, but nowadays most compilers have also a strong component of code optimization, that explore as much as possible the potential of the computer architectures to which the compiler must generate the code. These optimizations are based on the information provided by several analysis processes. This paper present some of these code analysis and optimizations, and shows how they can be used to solve problems or improve the quality of solutions used at areas such as industrial engineer and planning

    A Domain-Specific Language for Generating Dataflow Analyzers

    Get PDF
    Dataflow analysis is a well-understood and very powerful technique for analyzing programs as part of the compilation process. Virtually all compilers use some sort of dataflow analysis as part of their optimization phase. However, despite being well-understood theoretically, such analyses are often difficult to code, making it difficult to quickly experiment with variants. To address this, we developed a domain-specific language, Analyzer Generator (AG), that synthesizes dataflow analysis phases for Microsoft's Phoenix compiler framework. AG hides the fussy details needed to make analyses modular, yet generates code that is as efficient as the hand-coded equivalent. One key construct we introduce allows IR object classes to be extended without recompiling. Experimental results on three analyses show that AG code can be one-tenth the size of the equivalent handwritten C++ code with no loss of performance. It is our hope that AG will make developing new dataflow analyses much easier

    A compiler level intermediate representation based binary analysis system and its applications

    Get PDF
    Analyzing and optimizing programs from their executables has received a lot of attention recently in the research community. There has been a tremendous amount of activity in executable-level research targeting varied applications such as security vulnerability analysis, untrusted code analysis, malware analysis, program testing, and binary optimizations. The vision of this dissertation is to advance the field of static analysis of executables and bridge the gap between source-level analysis and executable analysis. The main thesis of this work is scalable static binary rewriting and analysis using compiler-level intermediate representation without relying on the presence of metadata information such as debug or symbolic information. In spite of a significant overlap in the overall goals of several source-code methods and executables-level techniques, several sophisticated transformations that are well-understood and implemented in source-level infrastructures have yet to become available in executable frameworks. It is a well known fact that a standalone executable without any meta data is less amenable to analysis than the source code. Nonetheless, we believe that one of the prime reasons behind the limitations of existing executable frameworks is that current executable frameworks define their own intermediate representations (IR) which are significantly more constrained than an IR used in a compiler. Intermediate representations used in existing binary frameworks lack high level features like abstract stack, variables, and symbols and are even machine dependent in some cases. This severely limits the application of well-understood compiler transformations to executables and necessitates new research to make them applicable. In the first part of this dissertation, we present techniques to convert the binaries to the same high-level intermediate representation that compilers use. We propose methods to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable to an abstract stack. The proposed methods are practical since they do not employ symbolic, relocation, or debug information which are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called \emph{SecondWrite} that uses LLVM as the IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, including several real world programs. In the next part of this work, we demonstrate that several well-known source-level analysis frameworks such as symbolic analysis have limited effectiveness in the executable domain since executables typically lack higher-level semantics such as program variables. The IR should have a precise memory abstraction for an analysis to effectively reason about memory operations. Our first work of recovering a compiler-level representation addresses this limitation by recovering several higher-level semantics information from executables. In the next part of this work, we propose methods to handle the scenarios when such semantics cannot be recovered. First, we propose a hybrid static-dynamic mechanism for recovering a precise and correct memory model in executables in presence of executable-specific artifacts such as indirect control transfers. Next, the enhanced memory model is employed to define a novel symbolic analysis framework for executables that can perform the same types of program analysis as source-level tools. Frameworks hitherto fail to simultaneously maintain the properties of correct representation and precise memory model and ignore memory-allocated variables while defining symbolic analysis mechanisms. We exemplify that our framework is robust, efficient and it significantly improves the performance of various traditional analyses like global value numbering, alias analysis and dependence analysis for executables. Finally, the underlying representation and analysis framework is employed for two separate applications. First, the framework is extended to define a novel static analysis framework, \emph{DemandFlow}, for identifying information flow security violations in program executables. Unlike existing static vulnerability detection methods for executables, DemandFlow analyzes memory locations in addition to symbols, thus improving the precision of the analysis. DemandFlow proposes a novel demand-driven mechanism to identify and precisely analyze only those program locations and memory accesses which are relevant to a vulnerability, thus enhancing scalability. DemandFlow uncovers six previously undiscovered format string and directory traversal vulnerabilities in popular ftp and internet relay chat clients. Next, the framework is extended to implement a platform-specific optimization for embedded processors. Several embedded systems provide the facility of locking one or more lines in the cache. We devise the first method in literature that employs instruction cache locking as a mechanism for improving the average-case run-time of general embedded applications. We demonstrate that the optimal solution for instruction cache locking can be obtained in polynomial time. Since our scheme is implemented inside a binary framework, it successfully addresses the portability concern by enabling the implementation of cache locking at the time of deployment when all the details of the memory hierarchy are available

    Finding Missed Compiler Optimizations by Differential Testing

    Get PDF
    International audienceRandomized differential testing of compilers has had great success in finding compiler crashes and silent miscompila-tions. In this paper we investigate whether we can use similar techniques to improve the quality of the generated code: Can we compare the code generated by different compilers to find optimizations performed by one but missed by another? We have developed a set of tools for running such tests. We compile C code generated by standard random program generators and use a custom binary analysis tool to compare the output programs. Depending on the optimization of interest, the tool can be configured to compare features such as the number of total instructions, multiply or divide instructions, function calls, stack accesses, and more. A standard test case reduction tool produces minimal examples once an interesting difference has been found. We have used our tools to compare the code generated by GCC, Clang, and CompCert. We have found previously un-reported missing arithmetic optimizations in all three compilers, as well as individual cases of unnecessary register spilling, missed opportunities for register coalescing, dead stores, redundant computations, and missing instruction selection patterns

    A Language for Specifying Compiler Optimizations for Generic Software

    Get PDF
    Compiler optimization is important to software performance, and modern processor architectures make optimization even more critical. However, many modern software applications use libraries providing high levels of abstraction. Such libraries often hinder effective optimization—the libraries are difficult to analyze using current compiler technology. For example, high-level libraries often use dynamic memory allocation and indirectly expressed control structures, such as iterator-based loops. Programs using these libraries often cannot achieve an optimal level of performance. On the other hand, software libraries have also been recognized as potentially aiding in program optimization. One proposed implementation of library-based optimization is to allow the library author, or a library user, to define custom analyses and optimizations. Only limited systems have been created to take advantage of this potential, however. One problem in creating a framework for defining new optimizations and analyses is how users are to specify them: implementing them by hand inside a compiler is difficult and prone to errors. Thus, a domain-specific language for library-based compiler optimizations would be beneficial. Many optimization specification languages have appeared in the literature, but they tend to be either limited in power or unnecessarily difficult to use. Therefore, I have designed, implemented, and evaluated the Pavilion language for specifying program analyses and optimizations, designed for library authors and users. These analyses and optimizations can be based on the implementation of a particular library, its use in a specific program, or on the properties of a broad range of types, expressed through concepts. The new system is intended to provide a high level of expressiveness, even though the intended users are unlikely to be compiler experts

    Safe code transfromations for speculative execution in real-time systems

    Get PDF
    Although compiler optimization techniques are standard and successful in non-real-time systems, if naively applied, they can destroy safety guarantees and deadlines in hard real-time systems. For this reason, real-time systems developers have tended to avoid automatic compiler optimization of their code. However, real-time applications in several areas have been growing substantially in size and complexity in recent years. This size and complexity makes it impossible for real-time programmers to write optimal code, and consequently indicates a need for compiler optimization. Recently researchers have developed or modified analyses and transformations to improve performance without degrading worst-case execution times. Moreover, these optimization techniques can sometimes transform programs which may not meet constraints/deadlines, or which result in timeouts, into deadline-satisfying programs. One such technique, speculative execution, also used for example in parallel computing and databases, can enhance performance by executing parts of the code whose execution may or may not be needed. In some cases, rollback is necessary if the computation turns out to be invalid. However, speculative execution must be applied carefully to real-time systems so that the worst-case execution path is not extended. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Deterministic worst-case execution for satisfying hard real-time constraints, and speculative execution with rollback for improving average-case throughput, appear to lie on opposite ends of a spectrum of performance requirements and strategies. Nonetheless, this thesis shows that there are situations in which speculative execution can improve the performance of a hard real-time system, either by enhancing average performance while not affecting the worst-case, or by actually decreasing the worst-case execution time. The thesis proposes a set of compiler transformation rules to identify opportunities for speculative execution and to transform the code. Proofs for semantic correctness and timeliness preservation are provided to verify safety of applying transformation rules to real-time systems. Moreover, an extensive experiment using simulation of randomly generated real-time programs have been conducted to evaluate applicability and profitability of speculative execution. The simulation results indicate that speculative execution improves average execution time and program timeliness. Finally, a prototype implementation is described in which these transformations can be evaluated for realistic applications
    corecore