108 research outputs found
Hashing Fuzzing: Introducing Input Diversity to Improve Crash Detection
The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage -- up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers -- between 28% to 97% increases compared to test sets for untransformed programs
Seeding Contradiction: a fast method for generating full-coverage test suites
The regression test suite, a key resource for managing program evolution,
needs to achieve 100% coverage, or very close, to be useful. Devising a test
suite manually is unacceptably tedious, but existing automated methods are
often inefficient. The method described in this article, ``Seeding
Contradiction'', inserts incorrect instructions into every basic block of the
program, enabling an SMT-based Hoare-style prover to generate a counterexample
for every branch of the program and, from the collection of all such
counterexamples, a test suite. The method is static, works fast, and achieves
excellent coverage
Hashing fuzzing: introducing input diversity to improve crash detection
The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program
Recommended from our members
Regression testing experiments
Software maintenance is an expensive part of the software lifecycle: estimates put its cost at up to two-thirds of the entire cost of software. Regression testing, which tests software after it has been modified to help assess and increase its reliability, is responsible for a large part of this cost. Thus, making regression testing more efficient and effective is worthwhile. This thesis performs two experiments with regression testing techniques. The first experiment involves two regression test selection techniques, Dejavu and Pythia. These techniques select a subset of tests from the original test suite to be rerun instead of the entire original test suite in an attempt to save valuable testing time. The experiment investigates the cost and benefit tradeoffs between these techniques. The data indicate that Dejavu can occasionally select smaller test suites than Pythia while Pythia often is more efficient at figuring out which test cases to select than Dejavu. The second experiment involves the investigation of program spectra as a tool to enhance regression testing. Program spectra characterize a program's behavior. The experiment investigates the applicability of program spectra to the detection of faults in modified software. The data indicate that certain types of spectra identify faults on a consistent basis. The data also reveal cost-benefit tradeoffs among spectra types
Automatically Generating Test Cases for Safety-Critical Software via Symbolic Execution
Automated test generation based on symbolic execution can be beneficial for
systematically testing safety-critical software, to facilitate test engineers
to pursue the strict testing requirements mandated by the certification
standards, while controlling at the same time the costs of the testing process.
At the same time, the development of safety-critical software is often
constrained with programming languages or coding conventions that ban
linguistic features which are believed to downgrade the safety of the programs,
e.g., they do not allow dynamic memory allocation and variable-length arrays,
limit the way in which loops are used, forbid recursion, and bound the
complexity of control conditions. As a matter of facts, these linguistic
features are also the main efficiency-blockers for the test generation
approaches based on symbolic execution at the state of the art. This paper
contributes new evidence of the effectiveness of generating test cases with
symbolic execution for a significant class of industrial safety
critical-systems. We specifically focus on Scade, a largely adopted model-based
development language for safety-critical embedded software, and we report on a
case study in which we exploited symbolic execution to automatically generate
test cases for a set of safety-critical programs developed in Scade. To this
end, we introduce a novel test generator that we developed in a recent
industrial project on testing safety-critical railway software written in
Scade, and we report on our experience of using this test generator for testing
a set of Scade programs that belong to the development of an on-board signaling
unit for high-speed rail. The results provide empirically evidence that
symbolic execution is indeed a viable approach for generating high-quality test
suites for the safety-critical programs considered in our case study
Hashing fuzzing: introducing input diversity to improve crash detection
The utility of a test set of program inputs is strongly influenced by its diversity and its size. Syntax coverage has become a standard proxy for diversity. Although more sophisticated measures exist, such as proximity of a sample to a uniform distribution, methods to use them tend to be type dependent. We use r-wise hash functions to create a novel, semantics preserving, testability transformation for C programs that we call HashFuzz. Use of HashFuzz improves the diversity of test sets produced by instrumentation-based fuzzers. We evaluate the effect of the HashFuzz transformation on eight programs from the Google Fuzzer Test Suite using four state-of-the-art fuzzers that have been widely used in previous research. We demonstrate pronounced improvements in the performance of the test sets for the transformed programs across all the fuzzers that we used. These include strong improvements in diversity in every case, maintenance or small improvement in branch coverage – up to 4.8% improvement in the best case, and significant improvement in unique crash detection numbers – between 28% to 97% increases compared to test sets for untransformed program
Detecting Trivial Mutant Equivalences via Compiler Optimisations
Mutation testing realises the idea of fault-based testing, i.e., using artificial defects to guide the testing process. It is used to evaluate the adequacy of test suites and to guide test case generation. It is a potentially powerful form of testing, but it is well-known that its effectiveness is inhibited by the presence of equivalent mutants. We recently studied Trivial Compiler Equivalence (TCE) as a simple, fast and readily applicable technique for identifying equivalent mutants for C programs. In the present work, we augment our findings with further results for the Java programming language. TCE can remove a large portion of all mutants because they are determined to be either equivalent or duplicates of other mutants. In particular, TCE equivalent mutants account for 7.4% and 5.7% of all C and Java mutants, while duplicated mutants account for a further 21% of all C mutants and 5.4% Java mutants, on average. With respect to a benchmark ground truth suite (of known equivalent mutants), approximately 30% (for C) and 54% (for Java) are TCE equivalent. It is unsurprising that results differ between languages, since mutation characteristics are language-dependent. In the case of Java, our new results suggest that TCE may be particularly effective, finding almost half of all equivalent mutants
- …