86 research outputs found
Doctor of Philosophy
dissertationAggressive random testing tools, or fuzzers, are impressively effective at finding bugs in compilers and programming language runtimes. For example, a single test-case generator has resulted in more than 460 bugs reported for a number of production-quality C compilers. However, fuzzers can be hard to use. The first problem is that failures triggered by random test cases can be difficult to debug because these tests are often large. To report a compiler bug, one must often construct a small test case that triggers the bug. The existing automated test-case reduction technique, delta debugging, is not sufficient to produce small, reportable test cases. A second problem is that fuzzers are indiscriminate: they repeatedly find bugs that may not be severe enough to fix right away. Third, fuzzers tend to generate a large number of test cases that only trigger a few bugs. Some bugs are triggered much more frequently than others, creating needle-in-the-haystack problems. Currently, users rule out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and filtering test results. This dissertation investigates approaches to improving the utility of compiler fuzzers. Two components, an aggressive test-case reducer and a tamer, are added to the fuzzing workflow to make the fuzzer more user friendly. We introduce C-Reduce, an aggressive test-case reducer for C/C++ programs, which exploits rich domain-specific knowledge to output test cases nearly as good as those produced by skilled humans. This reducer produces outputs that are, on average, more than 30 times smaller than those produced by the existing reducer that is most commonly used by compiler engineers. Second, this dissertation formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Bug triage can be effectively automated, relying on techniques from machine learning to suppress duplicate bug-triggering test cases and test cases triggering known bugs. An evaluation shows the ability of this tool to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler
Secure Compilation (Dagstuhl Seminar 18201)
Secure compilation is an emerging field that puts together advances in
security, programming languages, verification, systems, and hardware
architectures in order to devise secure compilation chains that
eliminate many of today\u27s vulnerabilities.
Secure compilation aims to protect a source language\u27s abstractions in
compiled code, even against low-level attacks.
For a concrete example, all modern languages provide a notion of
structured control flow and an invoked procedure is expected to return
to the right place.
However, today\u27s compilation chains (compilers, linkers, loaders,
runtime systems, hardware) cannot efficiently enforce this
abstraction: linked low-level code can call and return to arbitrary
instructions or smash the stack, blatantly violating the high-level
abstraction.
The emerging secure compilation community aims to address such
problems by devising formal security criteria, efficient enforcement
mechanisms, and effective proof techniques.
This seminar strived to take a broad and inclusive view of secure
compilation and to provide a forum for discussion on the topic. The
goal was to identify interesting research directions and open
challenges by bringing together people working on building secure
compilation chains, on developing proof techniques and verification
tools, and on designing security mechanisms
Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers
Β© 2015 IEEE.Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings and thus scales well. Exploiting the race-freedom guarantees provided by Whoop, we achieve a sound partial-order reduction that significantly accelerates Corral, an industrial-strength bug-finder for concurrent programs. Using the combination of Whoop and Corral, we analyzed 16 drivers from the Linux 4.0 kernel, achieving 1.5 - 20Γ speedups over standalone Corral
Doctor of Philosophy
dissertationCompilers are indispensable tools to developers. We expect them to be correct. However, compiler correctness is very hard to be reasoned about. This can be partly explained by the daunting complexity of compilers. In this dissertation, I will explain how we constructed a random program generator, Csmith, and used it to find hundreds of bugs in strong open source compilers such as the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). The success of Csmith depends on its ability of being expressive and unambiguous at the same time. Csmith is composed of a code generator and a GTAV (Generation-Time Analysis and Validation) engine. They work interactively to produce expressive yet unambiguous random programs. The expressiveness of Csmith is attributed to the code generator, while the unambiguity is assured by GTAV. GTAV performs program analyses, such as points-to analysis and effect analysis, efficiently to avoid ambiguities caused by undefined behaviors or unspecifed behaviors. During our 4.25 years of testing, Csmith has found over 450 bugs in the GNU Compiler Collection (GCC) and the LLVM Compiler Infrastructure (LLVM). We analyzed the bugs by putting them into different categories, studying the root causes, finding their locations in compilers' source code, and evaluating their importance. We believe analysis results are useful to future random testers, as well as compiler writers/users
Cμ μ μμ€ κΈ°λ₯κ³Ό μ»΄νμΌλ¬ μ΅μ ν μ‘°νμν€κΈ°
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ»΄ν¨ν°κ³΅νλΆ, 2019. 2. νμΆ©κΈΈ.μ£Όλ₯ C μ»΄νμΌλ¬λ€μ νλ‘κ·Έλ¨μ μ±λ₯μ λμ΄κΈ° μν΄ κ³΅κ²©μ μΈ μ΅μ νλ₯Ό μννλλ°, κ·Έλ° μ΅μ νλ μ μμ€ κΈ°λ₯μ μ¬μ©νλ νλ‘κ·Έλ¨μ νλμ λ°κΎΈκΈ°λ νλ€. λΆννλ C μΈμ΄λ₯Ό λμμΈν λ μ μμ€ κΈ°λ₯κ³Ό μ»΄νμΌλ¬ μ΅μ νλ₯Ό μ μ νκ² μ‘°νμν€κ° κ΅μ₯ν μ΄λ ΅λ€λ κ²μ΄ νκ³μ μ
κ³μ μ€λ‘ μ΄λ€. μ μμ€ κΈ°λ₯μ μν΄μλ, κ·Έλ¬ν κΈ°λ₯μ΄ μμ€ν
νλ‘κ·Έλλ°μ μ¬μ©λλ ν¨ν΄μ μ μ§μν΄μΌ νλ€. μ»΄νμΌλ¬ μ΅μ νλ₯Ό μν΄μλ, μ£Όλ₯ μ»΄νμΌλ¬κ° μννλ 볡μ‘νκ³ λ ν¨κ³Όμ μΈ μ΅μ νλ₯Ό μ μ§μν΄μΌ νλ€. κ·Έλ¬λ μ μμ€ κΈ°λ₯κ³Ό μ»΄νμΌλ¬ μ΅μ νλ₯Ό λμμ μ μ§μνλ μ€νμλ―Έλ μ€λλ κΉμ§ μ μλ λ°κ° μλ€.
λ³Έ λ°μ¬νμ λ
Όλ¬Έμ μμ€ν
νλ‘κ·Έλλ°μμ μκΈ΄νκ² μ¬μ©λλ μ μμ€ κΈ°λ₯κ³Ό μ£Όμν μ»΄νμΌλ¬ μ΅μ νλ₯Ό μ‘°νμν¨λ€. ꡬ체μ μΌλ‘, μ°λ¦° λ€μ μ±μ§μ λ§μ‘±νλ λμ¨ν λμμ±, λΆν μ»΄νμΌ, μ μ-ν¬μΈν° λ³νμ μ€νμλ―Έλ₯Ό μ²μμΌλ‘ μ μνλ€. 첫째, κΈ°λ₯μ΄ μμ€ν
νλ‘κ·Έλλ°μμ μ¬μ©λλ ν¨ν΄κ³Ό, κ·Έλ¬ν ν¨ν΄μ λ
Όμ¦ν μ μλ κΈ°λ²μ μ§μνλ€. λμ§Έ, μ£Όμν μ»΄νμΌλ¬ μ΅μ νλ€μ μ§μνλ€. μ°λ¦¬κ° μ μν μ€νμλ―Έμ μμ κ°μ μ»κΈ° μν΄ μ°λ¦¬λ λ
Όλ¬Έμ μ£Όμ κ²°κ³Όλ₯Ό λλΆλΆ Coq μ¦λͺ
κΈ° μμμ μ¦λͺ
νκ³ , κ·Έ μ¦λͺ
μ κΈ°κ³μ μ΄κ³ μλ°νκ² νμΈνλ€.To improve the performance of C programs, mainstream compilers perform aggressive optimizations that may change the behaviors of programs that use low-level features in unidiomatic ways. Unfortunately, despite many years of research and industrial efforts, it has proven very difficult to adequately balance the conflicting criteria for low-level features and compiler optimizations in the design of the C programming language. On the one hand, C should support the common usage patterns of the low-level features in systems programming. On the other hand, C should also support the sophisticated and yet effective optimizations performed by mainstream compilers. None of the existing proposals for C semantics, however, sufficiently support low-level features and compiler optimizations at the same time.
In this dissertation, we resolve the conflict between some of the low-level features crucially used in systems programming and major compiler optimizations. Specifically, we develop the first formal semantics of relaxed-memory concurrency, separate compilation, and cast between integers and pointers that (1) supports their common usage patterns and reasoning principles for programmers, and (2) provably validates major compiler optimizations at the same time. To establish confidence in our formal semantics, we have formalized most of our key results in the Coq theorem prover, which automatically and rigorously checks the validity of the results.Abstract
Acknowledgements
Chapter I Prologue
Chapter II Relaxed-Memory Concurrency
Chapter III Separate Compilation and Linking
Chapter IV Cast between Integers and Pointers
Chapter V Epilogue
μ΄λ‘Docto
- β¦