14 research outputs found

    Grammar-based fuzzing using input features

    Get PDF
    In grammar-based fuzz testing, a formal grammar is used to produce test inputs that are syntactically valid in order to reach the business logic of a program under test. In this setting, it is advantageous to ensure a high diversity of inputs to test more of the program's behavior. How can we characterize features that make inputs diverse and associate them with the execution of particular parts of the program? Previous work does not answer this question to satisfaction, with most attempts mainly considering superficial features defined by the structure of the grammar such as the presence of production rules or terminal symbols, regardless of their context. We present a measure of input coverage called k-path coverage, which takes into account combinations of grammar entities up to a given context depth k, and makes it possible to efficiently express, assess, and achieve input diversity. In a series of experiments, we demonstrate and evaluate how to systematically attain k-path coverage, how it correlates with code coverage and can thus be used as its predictor. By automatically inferring explicit associations between k-path features and the coverage of individual methods we further show how to generate inputs that specifically target the execution of given code locations. We expect the presented instrument of k-paths to prove useful in numerous additional applications such as assessing the quality of grammars, serving as an adequacy criterion for input test suites, enabling test case prioritization, facilitating program comprehension, and perhaps beyond.Im Bereich des grammatik-basierten Fuzz-Testens benutzt man eine formale Grammatik, um Testeingaben zu produzieren, welche syntaktisch korrekt sind, mit dem Ziel die Geschäftslogik eines zu testenden Programms zu erreichen. Dafür ist es vorteilhaft eine hohe Diversität der Eingaben zu sichern, um mehr vom Verhalten des Programms testen zu können. Wie kann man Merkmale charakterisieren, die Eingaben vielfältig machen und diese mit der Ausführung bestimmter Programmteile in Verbindung bringen? Bisherige Ansätze liefern darauf keine ausreichende Antwort, denn meistens betrachten sie oberflächliche, durch die Grammatikstruktur definierte Merkmale, wie das Vorhandensein von Produktionsregeln oder Terminalen, unabhängig von ihrem Verwendungskontext. Wir präsentieren ein Maß für Eingabeabdeckung, genannt -path Abdeckung, welche Kombinationen von Grammatikelementen bis zu einer vorgegebenen Kontexttiefe berücksichtigt und es ermöglicht, die Diversität von Eingaben effizient auszudrücken, zu bewerten und zu erzielen. Mit Experimenten zeigen und evaluieren wir, wie man gezielt -path Abdeckung erreicht und wie sie mit der Codeabdeckung zusammenhängt und diese somit vorhersagen kann. Ferner zeigen wir wie automatisches Erlernen expliziter Assoziationen zwischen Merkmalen und der Abdeckung einzelner Methoden die Erzeugung von Eingaben ermöglicht, welche auf die Ausführung bestimmter Codestellen abzielen. Wir rechnen damit, dass sich -paths als ein vielseitiges Instrument beweisen, dessen Anwendung über solche Gebiete, wie z.B. Messung der Qualität von Grammatiken und Eingabe-Testsuiten, Testfallpriorisierung, oder Erleichterung von Programmverständnis, hinausgeht

    Systematically Covering Input Structure

    Get PDF
    Grammar-based testing uses a given grammar to produce syntactically valid inputs. To cover program features, it is necessary to also cover input features - say, all URL variants for a URL parser. Our k-path algorithm for grammar production systematically covers syntactic elements as well as their combinations. In our evaluation, we show that this results in a significantly higher code coverage than state of the art

    From Input Coverage to Code Coverage: Systematically Covering Input Structure with k-Paths

    Get PDF
    Grammar-based testing uses a given grammar to produce syntactically valid inputs. Intuitively, to cover program features, it is necessary to also cover input features. We present a measure of input coverage called k-path coverage, which takes into account the coverage of individual syntactic elements as well as their combinations up to a given depth k. A k-path coverage with k = 1 prescribes that all individual symbols be covered; k-path coverage with k = 2 dictates that all symbols in the context of all their parents be covered; and so on. Using the k-path measure, we make a number of contributions. (1) We provide an \emph{algorithm for grammar-based production} that constructively covers a given k-path measure. In our evaluation, using k-path during production results in a significantly higher code coverage than state-of-the-art approaches that ignore input coverage. (2) We show on a selection of real-world subjects that coverage of input elements, as measured by k-path, correlates with code coverage. As a consequence, k-path coverage can also be used to predict code coverage. (3) We show that one can learn associations between individual k-path features and coverage of specific locations: "Method `distrule` is invoked whenever both `+` and `*` occur in an expression." Developers can interpret these associations to create suitable inputs that focus on selected methods, or have tools generate inputs that immediately target these methods. The above approaches have been implemented in the \tribble and \codeine prototypes, and evaluated on a number of processors for JSON, CSV, URLs, and Markdown. All tools and data are available as open source

    XMLMate: Evolutionary XML Test Generation

    Get PDF
    Generating system inputs satisfying complex constraints is still a challenge for modern test generators. We present XMLMATE, a search-based test generator specially aimed at XML-based systems. XMLMATE leverages program structure, existing XML schemas, and XML inputs to generate, mutate, recombine, and evolve valid XML inputs. Over a set of seven XML-based systems, XMLMATE detected 31 new unique failures in production code, all triggered by system inputs and thus true alarms

    Efficient fuzz testing leveraging input, code, and execution

    Get PDF
    Any kind of smart testing technique must be very efficient to be competitive with random fuzz testing. State-of the-art test generators are largely inferior to random testing in real world applications. This work proposes to gather and evaluate lightweight analyses that can enable the creation of an efficient and sufficiently effective analysis-assisted fuzz tester. The analyses shall leverage information sources apart from the program under test itself, such as e.g. descriptions of the targeted input format in the form of extended context-free grammars, or hardware counters. As the main contributions, an efficient framework for building fuzzers around given analyses will be created, and with its help analyses will be identified and categorized according to their performance

    When does my program do this? learning circumstances of software behavior

    Get PDF
    We introduce Alhazen — an approach that automatically determines the circumstances under which a particular program behavior, such as a failure, takes place. Alhazen starts with a run that exhibits this behavior and automatically determines input features associated with the behavior in question: (1) We use a grammar to parse the input into individual elements. (2) We determine features from the elements such as existence, length, or numerical values. (3) We use a decision tree learner to observe and learn which input features are associated with the behavior in question. (4) We use the grammar to generate additional inputs to further strengthen or refute hypotheses as learned associations. (5) By repeating steps 2 to 4, we obtain a theory that explains and predicts the given behavior. In our evaluation using inputs for find, grep, NetHack, and a JavaScript transpiler, the theories produced by Alhazen predict and produce failures with high accuracy and allow developers to focus on a small set of input features: “grep fails whenever the --fixed-strings option is used in conjunction with an empty search string.

    When does my Program do this? Learning Circumstances of Software Behavior

    Get PDF
    We introduce Alhazen—an approach that automatically determines the circumstances under which a particular program behavior, such as a failure, takes place. Alhazen starts with a run that exhibits this behavior and automatically determines _input features_ associated with the behavior in question: (1) We use a _grammar_ to parse the input into individual elements. (2) We determine _features_ from the elements such as existence, length, or numerical values. (3) We use a decision tree learner to _observe_ and _learn_ which input features are associated with the behavior in question. (4) We use the grammar to _generate additional inputs_ to further strengthen or refute hypotheses as learned associations. (5) By repeating steps 2 to 4, we obtain a _theory_ that explains and predicts the given behavior. In our evaluation using inputs for find, grep, NetHack, and a JavaScript transpiler, the theories produced by Alhazen predict and produce failures with high accuracy and allow developers to focus on a small set of input features: “grep fails whenever the --fixed-strings option is used in conjunction with an empty search string.

    Inputs from Hell Learning Input Distributions for Grammar-Based Test Generation

    No full text

    Abstracting Failure-Inducing Inputs

    Get PDF
    A program fails. Under which circumstances does the failure occur? Starting with a single failure-inducing input ("The input ((4)) fails") and an input grammar, the DDSET algorithm uses systematic tests to automatically generalize the input to an abstract failure-inducing input that contains both (concrete) terminal symbols and (abstract) nonterminal symbols from the grammar - for instance, "((⟨expr⟩))", which represents any expression ⟨expr⟩ in double parentheses. Such an abstract failure-inducing input can be used (1) as a debugging diagnostic, characterizing the circumstances under which a failure occurs ("The error occurs whenever an expression is enclosed in double parentheses"); (2) as a producer of additional failure-inducing tests to help design and validate fixes and repair candidates ("The inputs ((1)), ((3 * 4)), and many more also fail"). In its evaluation on real-world bugs in JavaScript, Clojure, Lua, and UNIX command line utilities, DDSET’s abstract failure-inducing inputs provided to-the-point diagnostics, and precise producers for further failure inducing inputs
    corecore