386 research outputs found
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
At ICSE'2013, there was the first session ever dedicated to automatic program
repair. In this session, Kim et al. presented PAR, a novel template-based
approach for fixing Java bugs. We strongly disagree with key points of this
paper. Our critical review has two goals. First, we aim at explaining why we
disagree with Kim and colleagues and why the reasons behind this disagreement
are important for research on automatic software repair in general. Second, we
aim at contributing to the field with a clarification of the essential ideas
behind automatic software repair. In particular we discuss the main evaluation
criteria of automatic software repair: understandability, correctness and
completeness. We show that depending on how one sets up the repair scenario,
the evaluation goals may be contradictory. Eventually, we discuss the nature of
fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
Dynamic Analysis can be Improved with Automatic Test Suite Refactoring
Context: Developers design test suites to automatically verify that software
meets its expected behaviors. Many dynamic analysis techniques are performed on
the exploitation of execution traces from test cases. However, in practice,
there is only one trace that results from the execution of one manually-written
test case.
Objective: In this paper, we propose a new technique of test suite
refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a
test case into small test fragments, which cover a simpler part of the control
flow to provide better support for dynamic analysis.
Method: For a given dynamic analysis technique, our test suite refactoring
approach monitors the execution of test cases and identifies small test cases
without loss of the test ability. We apply B-Refactoring to assist two existing
analysis tasks: automatic repair of if-statements bugs and automatic analysis
of exception contracts.
Results: Experimental results show that test suite refactoring can
effectively simplify the execution traces of the test suite. Three real-world
bugs that could previously not be fixed with the original test suite are fixed
after applying B-Refactoring; meanwhile, exception contracts are better
verified via applying B-Refactoring to original test suites.
Conclusions: We conclude that applying B-Refactoring can effectively improve
the purity of test cases. Existing dynamic analysis tasks can be enhanced by
test suite refactoring
An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems
We present the first systematic analysis of the characteristics of patch
search spaces for automatic patch generation systems. We analyze the search
spaces of two current state-of-the-art systems, SPR and Prophet, with 16
different search space configurations. Our results are derived from an analysis
of 1104 different search spaces and 768 patch generation executions. Together
these experiments consumed over 9000 hours of CPU time on Amazon EC2.
The analysis shows that 1) correct patches are sparse in the search spaces
(typically at most one correct patch per search space per defect), 2) incorrect
patches that nevertheless pass all of the test cases in the validation test
suite are typically orders of magnitude more abundant, and 3) leveraging
information other than the test suite is therefore critical for enabling the
system to successfully isolate correct patches.
We also characterize a key tradeoff in the structure of the search spaces.
Larger and richer search spaces that contain correct patches for more defects
can actually cause systems to find fewer, not more, correct patches. We
identify two reasons for this phenomenon: 1) increased validation times because
of the presence of more candidate patches and 2) more incorrect patches that
pass the test suite and block the discovery of correct patches. These
fundamental properties, which are all characterized for the first time in this
paper, help explain why past systems often fail to generate correct patches and
help identify challenges, opportunities, and productive future directions for
the field
- …