2,704 research outputs found
Test Case Purification for Improving Fault Localization
Finding and fixing bugs are time-consuming activities in software
development. Spectrum-based fault localization aims to identify the faulty
position in source code based on the execution trace of test cases. Failing
test cases and their assertions form test oracles for the failing behavior of
the system under analysis. In this paper, we propose a novel concept of
spectrum driven test case purification for improving fault localization. The
goal of test case purification is to separate existing test cases into small
fractions (called purified test cases) and to enhance the test oracles to
further localize faults. Combining with an original fault localization
technique (e.g., Tarantula), test case purification results in better ranking
the program statements. Our experiments on 1800 faults in six open-source Java
programs show that test case purification can effectively improve existing
fault localization techniques
Dynamic Analysis can be Improved with Automatic Test Suite Refactoring
Context: Developers design test suites to automatically verify that software
meets its expected behaviors. Many dynamic analysis techniques are performed on
the exploitation of execution traces from test cases. However, in practice,
there is only one trace that results from the execution of one manually-written
test case.
Objective: In this paper, we propose a new technique of test suite
refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a
test case into small test fragments, which cover a simpler part of the control
flow to provide better support for dynamic analysis.
Method: For a given dynamic analysis technique, our test suite refactoring
approach monitors the execution of test cases and identifies small test cases
without loss of the test ability. We apply B-Refactoring to assist two existing
analysis tasks: automatic repair of if-statements bugs and automatic analysis
of exception contracts.
Results: Experimental results show that test suite refactoring can
effectively simplify the execution traces of the test suite. Three real-world
bugs that could previously not be fixed with the original test suite are fixed
after applying B-Refactoring; meanwhile, exception contracts are better
verified via applying B-Refactoring to original test suites.
Conclusions: We conclude that applying B-Refactoring can effectively improve
the purity of test cases. Existing dynamic analysis tasks can be enhanced by
test suite refactoring
You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
Properly benchmarking Automated Program Repair (APR) systems should
contribute to the development and adoption of the research outputs by
practitioners. To that end, the research community must ensure that it reaches
significant milestones by reliably comparing state-of-the-art tools for a
better understanding of their strengths and weaknesses. In this work, we
identify and investigate a practical bias caused by the fault localization (FL)
step in a repair pipeline. We propose to highlight the different fault
localization configurations used in the literature, and their impact on APR
systems when applied to the Defects4J benchmark. Then, we explore the
performance variations that can be achieved by `tweaking' the FL step.
Eventually, we expect to create a new momentum for (1) full disclosure of APR
experimental procedures with respect to FL, (2) realistic expectations of
repairing bugs in Defects4J, as well as (3) reliable performance comparison
among the state-of-the-art APR systems, and against the baseline performance
results of our thoroughly assessed kPAR repair tool. Our main findings include:
(a) only a subset of Defects4J bugs can be currently localized by commonly-used
FL techniques; (b) current practice of comparing state-of-the-art APR systems
(i.e., counting the number of fixed bugs) is potentially misleading due to the
bias of FL configurations; and (c) APR authors do not properly qualify their
performance achievement with respect to the different tuning parameters
implemented in APR systems.Comment: Accepted by ICST 201
Amortising the Cost of Mutation Based Fault Localisation using Statistical Inference
Mutation analysis can effectively capture the dependency between source code
and test results. This has been exploited by Mutation Based Fault Localisation
(MBFL) techniques. However, MBFL techniques suffer from the need to expend the
high cost of mutation analysis after the observation of failures, which may
present a challenge for its practical adoption. We introduce SIMFL (Statistical
Inference for Mutation-based Fault Localisation), an MBFL technique that allows
users to perform the mutation analysis in advance against an earlier version of
the system. SIMFL uses mutants as artificial faults and aims to learn the
failure patterns among test cases against different locations of mutations.
Once a failure is observed, SIMFL requires either almost no or very small
additional cost for analysis, depending on the used inference model. An
empirical evaluation of SIMFL using 355 faults in Defects4J shows that SIMFL
can successfully localise up to 103 faults at the top, and 152 faults within
the top five, on par with state-of-the-art alternatives. The cost of mutation
analysis can be further reduced by mutation sampling: SIMFL retains over 80% of
its localisation accuracy at the top rank when using only 10% of generated
mutants, compared to results obtained without sampling
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
TBar: Revisiting Template-based Automated Program Repair
We revisit the performance of template-based APR to build comprehensive
knowledge about the effectiveness of fix patterns, and to highlight the
importance of complementary steps such as fault localization or donor code
retrieval. To that end, we first investigate the literature to collect,
summarize and label recurrently-used fix patterns. Based on the investigation,
we build TBar, a straightforward APR tool that systematically attempts to apply
these fix patterns to program bugs. We thoroughly evaluate TBar on the
Defects4J benchmark. In particular, we assess the actual qualitative and
quantitative diversity of fix patterns, as well as their effectiveness in
yielding plausible or correct patches. Eventually, we find that, assuming a
perfect fault localization, TBar correctly/plausibly fixes 74/101 bugs.
Replicating a standard and practical pipeline of APR assessment, we demonstrate
that TBar correctly fixes 43 bugs from Defects4J, an unprecedented performance
in the literature (including all approaches, i.e., template-based, stochastic
mutation-based or synthesis-based APR).Comment: Accepted by ISSTA 201
Automatically Repairing Programs Using Both Tests and Bug Reports
The success of automated program repair (APR) depends significantly on its
ability to localize the defects it is repairing. For fault localization (FL),
APR tools typically use either spectrum-based (SBFL) techniques that use test
executions or information-retrieval-based (IRFL) techniques that use bug
reports. These two approaches often complement each other, patching different
defects. No existing repair tool uses both SBFL and IRFL. We develop RAFL
(Rank-Aggregation-Based Fault Localization), a novel FL approach that combines
multiple FL techniques. We also develop Blues, a new IRFL technique that uses
bug reports, and an unsupervised approach to localize defects. On a dataset of
818 real-world defects, SBIR (combined SBFL and Blues) consistently localizes
more bugs and ranks buggy statements higher than the two underlying techniques.
For example, SBIR correctly identifies a buggy statement as the most suspicious
for 18.1% of the defects, while SBFL does so for 10.9% and Blues for 3.1%. We
extend SimFix, a state-of-the-art APR tool, to use SBIR, SBFL, and Blues.
SimFix using SBIR patches 112 out of the 818 defects; 110 when using SBFL, and
55 when using Blues. The 112 patched defects include 55 defects patched
exclusively using SBFL, 7 patched exclusively using IRFL, 47 patched using both
SBFL and IRFL and 3 new defects. SimFix using Blues significantly outperforms
iFixR, the state-of-the-art IRFL-based APR tool. Overall, SimFix using our FL
techniques patches ten defects no prior tools could patch. By evaluating on a
benchmark of 818 defects, 442 previously unused in APR evaluations, we find
that prior evaluations on the overused Defects4J benchmark have led to overly
generous findings. Our paper is the first to (1) use combined FL for APR, (2)
apply a more rigorous methodology for measuring patch correctness, and (3)
evaluate on the new, substantially larger version of Defects4J.Comment: working pape
- …