7,627 research outputs found
Shadow symbolic execution for better testing of evolving software
In this idea paper, we propose a novel way for improving the testing of program changes via symbolic execution. At a high-level, our technique runs two different program versions in the same symbolic execution instance, with the old version effectively shadowing the new one. In this way, the technique can exploit precise dynamic value information to effectively drive execution toward the behaviour that has changed from one version to the next. We discuss the main challenges and opportunities of this approach in terms of pruning and prioritising path exploration, mapping elements across versions, and sharing common symbolic state between versions. Copyright © 2014 ACM
Shadow of a Doubt: Testing for Divergences Between Software Versions
© 2016 ACM.While developers are aware of the importance of comprehensively testing patches, the large effort involved in coming up with relevant test cases means that such testing rarely happens in practice. Furthermore, even when test cases are written to cover the patch, they often exercise the same behaviour in the old and the new version of the code. In this paper, we present a symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new version diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version. We evaluate our technique on the Coreutils patches from the CoREBench suite of regression bugs, and show that it is able to generate test inputs that exercise newly added behaviours and expose some of the regression bugs
Shadow symbolic execution for testing software patches
While developers are aware of the importance of comprehensively testing patches, the large effort involved in coming up with relevant test cases means that such testing rarely happens in practice. Furthermore, even when test cases are written to cover the patch, they often exercise the same behaviour in the old and the new version of the code. In this article, we present a symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new version diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version. We evaluate our technique on the Coreutils patches from the CoREBench suite of regression bugs, and show that it is able to generate test inputs that exercise newly added behaviours and expose some of the regression bugs
FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage
In recent years, fuzz testing has proven itself to be one of the most
effective techniques for finding correctness bugs and security vulnerabilities
in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has
become popular thanks to its ease-of-use and bug-finding power. However, AFL
remains limited in the depth of program coverage it achieves, in particular
because it does not consider which parts of program inputs should not be
mutated in order to maintain deep program coverage. We propose an approach,
FairFuzz, that helps alleviate this limitation in two key steps. First,
FairFuzz automatically prioritizes inputs exercising rare parts of the program
under test. Second, it automatically adjusts the mutation of inputs so that the
mutated inputs are more likely to exercise these same rare parts of the
program. We conduct evaluation on real-world programs against state-of-the-art
versions of AFL, thoroughly repeating experiments to get good measures of
variability. We find that on certain benchmarks FairFuzz shows significant
coverage increases after 24 hours compared to state-of-the-art versions of AFL,
while on others it achieves high program coverage at a significantly faster
rate
Sydr-Fuzz: Continuous Hybrid Fuzzing and Dynamic Analysis for Security Development Lifecycle
Nowadays automated dynamic analysis frameworks for continuous testing are in
high demand to ensure software safety and satisfy the security development
lifecycle~(SDL) requirements. The security bug hunting efficiency of
cutting-edge hybrid fuzzing techniques outperforms widely utilized
coverage-guided fuzzing. We propose an enhanced dynamic analysis pipeline to
leverage productivity of automated bug detection based on hybrid fuzzing. We
implement the proposed pipeline in the continuous fuzzing toolset Sydr-Fuzz
which is powered by hybrid fuzzing orchestrator, integrating our DSE tool Sydr
with libFuzzer and AFL++. Sydr-Fuzz also incorporates security predicate
checkers, crash triaging tool Casr, and utilities for corpus minimization and
coverage gathering. The benchmarking of our hybrid fuzzer against alternative
state-of-the-art solutions demonstrates its superiority over coverage-guided
fuzzers while remaining on the same level with advanced hybrid fuzzers.
Furthermore, we approve the relevance of our approach by discovering 85 new
real-world software flaws within the OSS-Sydr-Fuzz project. Finally, we open
Casr source code to the community to facilitate examination of the existing
crashes
Hybrid Differential Software Testing
Differentielles Testen ist ein wichtiger Bestandteil der Qualitätssicherung von Software, mit dem Ziel Testeingaben zu generieren, die Unterschiede im Verhalten der Software deutlich machen. Solche Unterschiede können zwischen zwei Ausführungspfaden (1) in unterschiedlichen Programmversionen, aber auch (2) im selben Programm auftreten. In dem ersten Fall werden unterschiedliche Programmversionen mit der gleichen Eingabe untersucht, während bei dem zweiten Fall das gleiche Programm mit unterschiedlichen Eingaben analysiert wird. Die Regressionsanalyse, die Side-Channel Analyse, das Maximieren der Ausführungskosten eines Programms und die Robustheitsanalyse von Neuralen Netzwerken sind typische Beispiele für differentielle Softwareanalysen.
Eine besondere Herausforderung liegt in der effizienten Analyse von mehreren Programmpfaden (auch über mehrere Programmvarianten hinweg). Die existierenden Ansätze sind dabei meist nicht (spezifisch) dafür konstruiert, unterschiedliches Verhalten präzise hervorzurufen oder sind auf einen Teil des Suchraums limitiert.
Diese Arbeit führt das Konzept des hybriden differentiellen Software Testens (HyDiff) ein: eine hybride Analysetechnik für die Generierung von Eingaben zur Erkennung von semantischen Unterschieden in Software. HyDiff besteht aus zwei parallel laufenden Komponenten: (1) einem such-basierten Ansatz, der effizient Eingaben generiert und (2) einer systematischen Analyse, die auch komplexes Programmverhalten erreichen kann. Die such-basierte Komponente verwendet Fuzzing geleitet durch differentielle Heuristiken. Die systematische Analyse basiert auf Dynamic Symbolic Execution, das konkrete Eingaben bei der Analyse integrieren kann.
HyDiff wird anhand mehrerer Experimente evaluiert, die in spezifischen Anwendungen im Bereich des differentiellen Testens ausgeführt werden. Die Resultate zeigen eine effektive Generierung von Testeingaben durch HyDiff, wobei es sich signifikant besser als die einzelnen Komponenten verhält.Differential software testing is important for software quality assurance as it aims to automatically generate test inputs that reveal behavioral differences in software. The concrete analysis procedure depends on the targeted result: differential testing can reveal divergences between two execution paths (1) of different program versions or (2) within the same program. The first analysis type would execute different program versions with the same input, while the second type would execute the same program with different inputs. Therefore, detecting regression bugs in software evolution, analyzing side-channels in programs, maximizing the execution cost of a program over multiple executions, and evaluating the robustness of neural networks are instances of differential software analysis with the goal to generate diverging executions of program paths.
The key challenge of differential software testing is to simultaneously reason about multiple program paths, often across program variants, in an efficient way. Existing work in differential testing is often not (specifically) directed to reveal a different behavior or is limited to a subset of the search space.
This PhD thesis proposes the concept of Hybrid Differential Software Testing (HyDiff) as a hybrid analysis technique to generate difference revealing inputs. HyDiff consists of two components that operate in a parallel setup: (1) a search-based technique that inexpensively generates inputs and (2) a systematic exploration technique to also exercise deeper program behaviors. HyDiff’s search-based component uses differential fuzzing directed by differential heuristics. HyDiff’s systematic exploration component is based on differential dynamic symbolic execution that allows to incorporate concrete inputs in its analysis.
HyDiff is evaluated experimentally with applications specific for differential testing. The results show that HyDiff is effective in all considered categories and outperforms its components in isolation
MuDelta: Delta-Oriented Mutation Testing at Commit Time
To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software
Development Context Driven Change Awareness and Analysis Framework
Recent work on workspace monitoring allows conflict prediction early in the development process, however, these approaches mostly use syntactic differencing techniques to compare different program versions. In contrast, traditional change-impact analysis techniques analyze related versions of the program only after the code has been checked into the master repository. We propose a novel approach, De- CAF (Development Context Analysis Framework), that leverages the development context to scope a change impact analysis technique. The goal is to characterize the impact of each developer on other developers in the team. There are various client applications such as task prioritization, early conflict detection, and providing advice on testing that can benefit from such a characterization. The DeCAF framework leverages information from the development context to bound the iDiSE change impact analysis technique to analyze only the parts of the code base that are of interest. Bounding the analysis can enable DeCAF to efficiently compute the impact of changes using a combination of program dependence and symbolic execution based approaches
Automated Unit Testing of Evolving Software
As software programs evolve, developers need to ensure that new changes do
not affect the originally intended functionality of the program. To increase their
confidence, developers commonly write unit tests along with the program, and
execute them after a change is made. However, manually writing these unit-tests
is difficult and time-consuming, and as their number increases, so does the cost
of executing and maintaining them.
Automated test generation techniques have been proposed in the literature
to assist developers in the endeavour of writing these tests. However, it remains
an open question how well these tools can help with fault finding in practice,
and maintaining these automatically generated tests may require extra effort
compared to human written ones.
This thesis evaluates the effectiveness of a number of existing automatic
unit test generation techniques at detecting real faults, and explores how these
techniques can be improved. In particular, we present a novel multi-objective
search-based approach for generating tests that reveal changes across two versions
of a program. We then investigate whether these tests can be used such that no
maintenance effort is necessary.
Our results show that overall, state-of-the-art test generation tools can indeed
be effective at detecting real faults: collectively, the tools revealed more than half
of the bugs we studied. We also show that our proposed alternative technique
that is better suited to the problem of revealing changes, can detect more faults,
and does so more frequently. However, we also find that for a majority of
object-oriented programs, even a random search can achieve good results. Finally, we
show that such change-revealing tests can be generated on demand in practice,
without requiring them to be maintained over time
Fuzzing Binaries for Memory Safety Errors with QASan
Fuzz testing techniques are becoming pervasive for their ever-improving ability to generate crashing trial cases for programs. Memory safety violations however can lead to silent corruptions and errors, and a fuzzer may recognize them only in the presence of sanitization machinery. For closed-source software combining sanitization with fuzzing incurs practical obstacles that we try to tackle with an architecture-independent proposal called QASan for detecting heap memory violations. In our tests QASan is competitive with standalone sanitizers and adds a moderate 1.61x average slowdown to the AFL++ fuzzer while enabling it to reveal more heap-related bugs
- …