150 research outputs found

    FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage

    Full text link
    In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the depth of program coverage it achieves, in particular because it does not consider which parts of program inputs should not be mutated in order to maintain deep program coverage. We propose an approach, FairFuzz, that helps alleviate this limitation in two key steps. First, FairFuzz automatically prioritizes inputs exercising rare parts of the program under test. Second, it automatically adjusts the mutation of inputs so that the mutated inputs are more likely to exercise these same rare parts of the program. We conduct evaluation on real-world programs against state-of-the-art versions of AFL, thoroughly repeating experiments to get good measures of variability. We find that on certain benchmarks FairFuzz shows significant coverage increases after 24 hours compared to state-of-the-art versions of AFL, while on others it achieves high program coverage at a significantly faster rate

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    The construction of oracles for software testing

    Get PDF
    Software testing is important throughout the software life cycle. Testing is the part of the software development process where a computer program is subject to specific conditions to show that the problem meets its intended design. Building a testing oracle is one part of software testing. An oracle is an external mechanism which can be used to check test output for correctness. The characteristics of available oracles have a dominating influence on the cost and quality of software testing. In this thesis, methods of constructing oracles are investigated and classified. There are three kinds of method of constructing oracles: the pseudo-oracle approach, oracles using attributed grammars and oracles based on formal specification. This thesis develops a method for constructing an oracle, based on the Z specification language. A specification language can describe the correct syntax and semantics of software. The contextual part of a specification describes all the legal input to the program and the semantics part describes the meaning of the given input data. Based on this idea, an oracle is constructed and a prototype is implemented according to the method proposed in the thesis

    Semantic Fuzzing with Zest

    Get PDF
    Programs expecting structured inputs often consist of both a syntactic analysis stage, which parses raw input, and a semantic analysis stage, which conducts checks on the parsed input and executes the core logic of the program. Generator-based testing tools in the lineage of QuickCheck are a promising way to generate random syntactically valid test inputs for these programs. We present Zest, a technique which automatically guides QuickCheck-like randominput generators to better explore the semantic analysis stage of test programs. Zest converts random-input generators into deterministic parametric generators. We present the key insight that mutations in the untyped parameter domain map to structural mutations in the input domain. Zest leverages program feedback in the form of code coverage and input validity to perform feedback-directed parameter search. We evaluate Zest against AFL and QuickCheck on five Java programs: Maven, Ant, BCEL, Closure, and Rhino. Zest covers 1.03x-2.81x as many branches within the benchmarks semantic analysis stages as baseline techniques. Further, we find 10 new bugs in the semantic analysis stages of these benchmarks. Zest is the most effective technique in finding these bugs reliably and quickly, requiring at most 10 minutes on average to find each bug.Comment: To appear in Proceedings of 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'19

    Grammar-based fuzzing using input features

    Get PDF
    In grammar-based fuzz testing, a formal grammar is used to produce test inputs that are syntactically valid in order to reach the business logic of a program under test. In this setting, it is advantageous to ensure a high diversity of inputs to test more of the program's behavior. How can we characterize features that make inputs diverse and associate them with the execution of particular parts of the program? Previous work does not answer this question to satisfaction, with most attempts mainly considering superficial features defined by the structure of the grammar such as the presence of production rules or terminal symbols, regardless of their context. We present a measure of input coverage called k-path coverage, which takes into account combinations of grammar entities up to a given context depth k, and makes it possible to efficiently express, assess, and achieve input diversity. In a series of experiments, we demonstrate and evaluate how to systematically attain k-path coverage, how it correlates with code coverage and can thus be used as its predictor. By automatically inferring explicit associations between k-path features and the coverage of individual methods we further show how to generate inputs that specifically target the execution of given code locations. We expect the presented instrument of k-paths to prove useful in numerous additional applications such as assessing the quality of grammars, serving as an adequacy criterion for input test suites, enabling test case prioritization, facilitating program comprehension, and perhaps beyond.Im Bereich des grammatik-basierten Fuzz-Testens benutzt man eine formale Grammatik, um Testeingaben zu produzieren, welche syntaktisch korrekt sind, mit dem Ziel die Geschäftslogik eines zu testenden Programms zu erreichen. Dafür ist es vorteilhaft eine hohe Diversität der Eingaben zu sichern, um mehr vom Verhalten des Programms testen zu können. Wie kann man Merkmale charakterisieren, die Eingaben vielfältig machen und diese mit der Ausführung bestimmter Programmteile in Verbindung bringen? Bisherige Ansätze liefern darauf keine ausreichende Antwort, denn meistens betrachten sie oberflächliche, durch die Grammatikstruktur definierte Merkmale, wie das Vorhandensein von Produktionsregeln oder Terminalen, unabhängig von ihrem Verwendungskontext. Wir präsentieren ein Maß für Eingabeabdeckung, genannt -path Abdeckung, welche Kombinationen von Grammatikelementen bis zu einer vorgegebenen Kontexttiefe berücksichtigt und es ermöglicht, die Diversität von Eingaben effizient auszudrücken, zu bewerten und zu erzielen. Mit Experimenten zeigen und evaluieren wir, wie man gezielt -path Abdeckung erreicht und wie sie mit der Codeabdeckung zusammenhängt und diese somit vorhersagen kann. Ferner zeigen wir wie automatisches Erlernen expliziter Assoziationen zwischen Merkmalen und der Abdeckung einzelner Methoden die Erzeugung von Eingaben ermöglicht, welche auf die Ausführung bestimmter Codestellen abzielen. Wir rechnen damit, dass sich -paths als ein vielseitiges Instrument beweisen, dessen Anwendung über solche Gebiete, wie z.B. Messung der Qualität von Grammatiken und Eingabe-Testsuiten, Testfallpriorisierung, oder Erleichterung von Programmverständnis, hinausgeht

    Token-Level Fuzzing

    Full text link
    Fuzzing has become a commonly used approach to identifying bugs in complex, real-world programs. However, interpreters are notoriously difficult to fuzz effectively, as they expect highly structured inputs, which are rarely produced by most fuzzing mutations. For this class of programs, grammar-based fuzzing has been shown to be effective. Tools based on this approach can find bugs in the code that is executed after parsing the interpreter inputs, by following language-specific rules when generating and mutating test cases. Unfortunately, grammar-based fuzzing is often unable to discover subtle bugs associated with the parsing and handling of the language syntax. Additionally, if the grammar provided to the fuzzer is incomplete, or does not match the implementation completely, the fuzzer will fail to exercise important parts of the available functionality. In this paper, we propose a new fuzzing technique, called Token-Level Fuzzing. Instead of applying mutations either at the byte level or at the grammar level, Token-Level Fuzzing applies mutations at the token level. Evolutionary fuzzers can leverage this technique to both generate inputs that are parsed successfully and generate inputs that do not conform strictly to the grammar. As a result, the proposed approach can find bugs that neither byte-level fuzzing nor grammar-based fuzzing can find. We evaluated Token-Level Fuzzing by modifying AFL and fuzzing four popular JavaScript engines, finding 29 previously unknown bugs, several of which could not be found with state-of-the-art byte-level and grammar-based fuzzers

    Revisiting Neural Program Smoothing for Fuzzing

    Full text link
    Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of machine learning (ML). Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.Comment: Accepted as conference paper at ESEC/FSE 202
    • …
    corecore