34,612 research outputs found
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
Identifying Patch Correctness in Test-Based Program Repair
Test-based automatic program repair has attracted a lot of attention in
recent years. However, the test suites in practice are often too weak to
guarantee correctness and existing approaches often generate a large number of
incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel
approach that heuristically determines the correctness of the generated
patches. The core idea is to exploit the behavior similarity of test case
executions. The passing tests on original and patched programs are likely to
behave similarly while the failing tests on original and patched programs are
likely to behave differently. Also, if two tests exhibit similar runtime
behavior, the two tests are likely to have the same test results. Based on
these observations, we generate new test inputs to enhance the test suites and
use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated
from existing program repair systems including jGenProg, Nopol, jKali, ACS and
HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches
to be generated, without blocking any correct patches.Comment: ICSE 201
An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems
We present the first systematic analysis of the characteristics of patch
search spaces for automatic patch generation systems. We analyze the search
spaces of two current state-of-the-art systems, SPR and Prophet, with 16
different search space configurations. Our results are derived from an analysis
of 1104 different search spaces and 768 patch generation executions. Together
these experiments consumed over 9000 hours of CPU time on Amazon EC2.
The analysis shows that 1) correct patches are sparse in the search spaces
(typically at most one correct patch per search space per defect), 2) incorrect
patches that nevertheless pass all of the test cases in the validation test
suite are typically orders of magnitude more abundant, and 3) leveraging
information other than the test suite is therefore critical for enabling the
system to successfully isolate correct patches.
We also characterize a key tradeoff in the structure of the search spaces.
Larger and richer search spaces that contain correct patches for more defects
can actually cause systems to find fewer, not more, correct patches. We
identify two reasons for this phenomenon: 1) increased validation times because
of the presence of more candidate patches and 2) more incorrect patches that
pass the test suite and block the discovery of correct patches. These
fundamental properties, which are all characterized for the first time in this
paper, help explain why past systems often fail to generate correct patches and
help identify challenges, opportunities, and productive future directions for
the field
- …