4,926 research outputs found
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
Recommended from our members
Fault-based regression testing in a reactive environment
Regression testing is the process of retesting software after modification. Regression testing is a major factor contributing to the high cost of software maintenance. To control this cost, regression testing must be accomplished efficiently through effective reuse of test cases and judicious generation of new test cases.Fault-based testing focuses on the detection of particular classes of faults. RELAY is a fault-based testing technique that guarantees the detection of errors caused by any fault in a chosen fault classification. RELAY can be used as a regression testing technique to generate the test cases required to demonstrate that a modification is properly made. In addition, the information related to a test case chosen to detect a potential fault guides in choosing previously-selected test cases that should be reused, for a given modification.This paper presents the concepts behind RELAY and discusses how RELAY could be used as a regression testing technique. It also describes a testing environment that supports reactive regression testing as well as testing throughout the development lifecycle, which is based on integrating the RELAY model with other testing techniques
Stateful Testing: Finding More Errors in Code and Contracts
Automated random testing has shown to be an effective approach to finding
faults but still faces a major unsolved issue: how to generate test inputs
diverse enough to find many faults and find them quickly. Stateful testing, the
automated testing technique introduced in this article, generates new test
cases that improve an existing test suite. The generated test cases are
designed to violate the dynamically inferred contracts (invariants)
characterizing the existing test suite. As a consequence, they are in a good
position to detect new errors, and also to improve the accuracy of the inferred
contracts by discovering those that are unsound. Experiments on 13 data
structure classes totalling over 28,000 lines of code demonstrate the
effectiveness of stateful testing in improving over the results of long
sessions of random testing: stateful testing found 68.4% new errors and
improved the accuracy of automatically inferred contracts to over 99%, with
just a 7% time overhead.Comment: 11 pages, 3 figure
Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
We present a clustering-based language model using word embeddings for text
readability prediction. Presumably, an Euclidean semantic space hypothesis
holds true for word embeddings whose training is done by observing word
co-occurrences. We argue that clustering with word embeddings in the metric
space should yield feature representations in a higher semantic space
appropriate for text regression. Also, by representing features in terms of
histograms, our approach can naturally address documents of varying lengths. An
empirical evaluation using the Common Core Standards corpus reveals that the
features formed on our clustering-based language model significantly improve
the previously known results for the same corpus in readability prediction. We
also evaluate the task of sentence matching based on semantic relatedness using
the Wiki-SimpleWiki corpus and find that our features lead to superior matching
performance
- …