12 research outputs found

    Synthesizing Program Input Grammars

    Full text link
    We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

    Master of Science

    Get PDF
    thesisDirect equivalence testing is a framework for detecting errors in C compilers and application programs that exploits the fact that program semantics should be preserved during the compilation process. Binaries generated from the same piece of code should remain equivalent irrespective of the compiler, or compiler optimizations, used. Compiler errors as well as program errors such as out of bounds memory access, stack over ow, and use of uninitialized local variables cause nonequivalence in the generated binaries. Direct equivalence testing has detected previously unknown errors in real world embedded software like TinyOS and in di fferent compilers like msp430-gcc and llvm-msp430

    Finding and understanding bugs in C compilers

    Get PDF
    ManuscriptCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers

    A formally verified compiler back-end

    Get PDF
    This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well

    Bagheera: an advanced polimorphic and infection engine for Linux

    Get PDF
    Computer viruses have been evolving since the '80s, adopting new techniques with the intention of avoiding being detected by anti-virus programs. One of these techniques is polymorphism, which is used to change the virus' structure each time an infection is carried out. This technique was broadly adopted by the virus-writing community and led to the birth of Polymorphic Engines, which can grant polymorphism to any virus. This project focuses on the study of those engines and, in particular, on exploring the techniques used to avoid detection from anti-viruses. In addition, this project also focuses on the analysis and development of techniques to infect ELF binaries on Linux platforms. The final goal is to design and build a modern polymorphic and infection engine, namely Bagheera, and to evaluate its effectiveness against a state of the art anti-virus in a Linux platform

    Doctor of Philosophy

    Get PDF
    dissertationCompilers are indispensable tools to developers. We expect them to be correct. However, compiler correctness is very hard to be reasoned about. This can be partly explained by the daunting complexity of compilers. In this dissertation, I will explain how we constructed a random program generator, Csmith, and used it to find hundreds of bugs in strong open source compilers such as the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). The success of Csmith depends on its ability of being expressive and unambiguous at the same time. Csmith is composed of a code generator and a GTAV (Generation-Time Analysis and Validation) engine. They work interactively to produce expressive yet unambiguous random programs. The expressiveness of Csmith is attributed to the code generator, while the unambiguity is assured by GTAV. GTAV performs program analyses, such as points-to analysis and effect analysis, efficiently to avoid ambiguities caused by undefined behaviors or unspecifed behaviors. During our 4.25 years of testing, Csmith has found over 450 bugs in the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). We analyzed the bugs by putting them into different categories, studying the root causes, finding their locations in compilers' source code, and evaluating their importance. We believe analysis results are useful to future random testers, as well as compiler writers/users

    On Matching Binary to Source Code

    Get PDF
    Reverse engineering of executable binary programs has diverse applications in computer security and forensics, and often involves identifying parts of code that are reused from third party software projects. Identification of code clones by comparing and fingerprinting low-level binaries has been explored in various pieces of work as an effective approach for accelerating the reverse engineering process. Binary clone detection across different environments and computing platforms bears significant challenges, and reasoning about sequences of low-level machine in- structions is a tedious and time consuming process. Because of these reasons, the ability of matching reused functions to their source code is highly advantageous, de- spite being rarely explored to date. In this thesis, we systematically assess the feasibility of automatic binary to source matching to aid the reverse engineering process. We highlight the challenges, elab- orate on the shortcomings of existing proposals, and design a new approach that is targeted at addressing the challenges while delivering more extensive and detailed results in a fully automated fashion. By evaluating our approach, we show that it is generally capable of uniquely matching over 50% of reused functions in a binary to their source code in a source database with over 500,000 functions, while narrowing down over 75% of reused functions to at most five candidates in most cases. Finally, we investigate and discuss the limitations and provide directions for future work

    Automatic test generation for the detection of performance bugs in code optimization

    Get PDF
    Software is everywhere in our daily lives, and it is important that software behaves in ways it is expected to. Testing is a widely accepted method for improving software quality. Testing detects the presence of bugs by comparing the actual outcome to the expected outcome of a computation. Testing for correctness is a well-studied problem. Testing for correctness compares the actual outcome of computation against its expected output. Typically, the expected output of a computation is unambiguous, since computations in computer software typically have clear semantics defined by the programming language. However, testing for performance is less studied. The expected outcome of a test may require context-knowledge not apparent in the test program itself. For example, by simply inspecting the code of a web server, one cannot determine what is the expected throughput. This makes performance testing for performance a challenging task. Testing compilers adds another layer of complexity. For compilers, a correctness bug during compiler optimization may introduce a bug in the resulting binary, even though the bug was not present in the source code. Similarly, a performance bug during optimization may cause inconsistencies in the runtimes of equivalent programs, where equivalent programs are defined as programs with identical outcomes but whose sources may differ through semantic-preserving transformations. Performance bugs prevent compilers from producing efficient code when they have the ability to do so. Many testing techniques have been proposed. Random testing is a powerful testing technique often associated with test generation. It allows a large testing space to be explored efficiently through sampling and is suitable for large and complex software with a large testing space, such as compilers. Random test generation for compilers has been shown to be effective in detecting correctness bugs. However, to the best of our knowledge, there is no previous study on random test generation for performance bugs in compilers. We believe one of the main reasons is the context-dependent nature when quantifying performance headroom. We propose a random test generation infrastructure for evaluating the performance of compilers. We quantify the performance headroom of tests by borrowing existing ideas from previous studies. Namely, when a set of equivalent programs is compiled by a compiler, all programs should aim to perform as well as the best-performing program. Additionally, when a program is compiled by a set of compilers, all compilers should aim to generate code that performs as well as the code generated by the best-performing compiler. We define metrics to evaluate compilers based on these ideas. We used our system to evaluate four modern compilers -- Intel's ICC, GNU's GCC, the Portland Group Inc.'s PGI compiler, and Clang -- on how well they handle loop unrolling, loop interchange, and loop unroll-and-jam. Results suggest that ICC typically performs better than the other three compilers. On the other hand, our system also identified extreme outliers for ICC where, for example, one program becomes x180000 slower after unrolling a loop. Due to the nature of random testing, we also study the methodologies required to achieve reproducible results by using statistical methods. We apply these methodologies to our compiler evaluation and provide evidence that our experiments are reproducible across different randomly generated collections of code segments
    corecore