131 research outputs found
You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
Properly benchmarking Automated Program Repair (APR) systems should
contribute to the development and adoption of the research outputs by
practitioners. To that end, the research community must ensure that it reaches
significant milestones by reliably comparing state-of-the-art tools for a
better understanding of their strengths and weaknesses. In this work, we
identify and investigate a practical bias caused by the fault localization (FL)
step in a repair pipeline. We propose to highlight the different fault
localization configurations used in the literature, and their impact on APR
systems when applied to the Defects4J benchmark. Then, we explore the
performance variations that can be achieved by `tweaking' the FL step.
Eventually, we expect to create a new momentum for (1) full disclosure of APR
experimental procedures with respect to FL, (2) realistic expectations of
repairing bugs in Defects4J, as well as (3) reliable performance comparison
among the state-of-the-art APR systems, and against the baseline performance
results of our thoroughly assessed kPAR repair tool. Our main findings include:
(a) only a subset of Defects4J bugs can be currently localized by commonly-used
FL techniques; (b) current practice of comparing state-of-the-art APR systems
(i.e., counting the number of fixed bugs) is potentially misleading due to the
bias of FL configurations; and (c) APR authors do not properly qualify their
performance achievement with respect to the different tuning parameters
implemented in APR systems.Comment: Accepted by ICST 201
Spectrum-Based Fault Localization in Model Transformations
Model transformations play a cornerstone role in Model-Driven Engineering (MDE), as they provide the essential
mechanisms for manipulating and transforming models. The correctness of software built using MDE
techniques greatly relies on the correctness of model transformations. However, it is challenging and error
prone to debug them, and the situation gets more critical as the size and complexity of model transformations
grow, where manual debugging is no longer possible.
Spectrum-Based Fault Localization (SBFL) uses the results of test cases and their corresponding code coverage
information to estimate the likelihood of each program component (e.g., statements) of being faulty.
In this article we present an approach to apply SBFL for locating the faulty rules in model transformations.
We evaluate the feasibility and accuracy of the approach by comparing the effectiveness of 18 different stateof-
the-art SBFL techniques at locating faults in model transformations. Evaluation results revealed that the
best techniques, namely Kulcynski2, Mountford, Ochiai, and Zoltar, lead the debugger to inspect a maximum
of three rules to locate the bug in around 74% of the cases. Furthermore, we compare our approach with a
static approach for fault localization in model transformations, observing a clear superiority of the proposed
SBFL-based method.Comisión Interministerial de Ciencia y Tecnología TIN2015-70560-RJunta de Andalucía P12-TIC-186
Test Case Purification for Improving Fault Localization
Finding and fixing bugs are time-consuming activities in software
development. Spectrum-based fault localization aims to identify the faulty
position in source code based on the execution trace of test cases. Failing
test cases and their assertions form test oracles for the failing behavior of
the system under analysis. In this paper, we propose a novel concept of
spectrum driven test case purification for improving fault localization. The
goal of test case purification is to separate existing test cases into small
fractions (called purified test cases) and to enhance the test oracles to
further localize faults. Combining with an original fault localization
technique (e.g., Tarantula), test case purification results in better ranking
the program statements. Our experiments on 1800 faults in six open-source Java
programs show that test case purification can effectively improve existing
fault localization techniques
LNCS
Static program analyzers are increasingly effective in checking correctness properties of programs and reporting any errors found, often in the form of error traces. However, developers still spend a significant amount of time on debugging. This involves processing long error traces in an effort to localize a bug to a relatively small part of the program and to identify its cause. In this paper, we present a technique for automated fault localization that, given a program and an error trace, efficiently narrows down the cause of the error to a few statements. These statements are then ranked in terms of their suspiciousness. Our technique relies only on the semantics of the given program and does not require any test cases or user guidance. In experiments on a set of C benchmarks, we show that our technique is effective in quickly isolating the cause of error while out-performing other state-of-the-art fault-localization techniques
Automatically Repairing Programs Using Both Tests and Bug Reports
The success of automated program repair (APR) depends significantly on its
ability to localize the defects it is repairing. For fault localization (FL),
APR tools typically use either spectrum-based (SBFL) techniques that use test
executions or information-retrieval-based (IRFL) techniques that use bug
reports. These two approaches often complement each other, patching different
defects. No existing repair tool uses both SBFL and IRFL. We develop RAFL
(Rank-Aggregation-Based Fault Localization), a novel FL approach that combines
multiple FL techniques. We also develop Blues, a new IRFL technique that uses
bug reports, and an unsupervised approach to localize defects. On a dataset of
818 real-world defects, SBIR (combined SBFL and Blues) consistently localizes
more bugs and ranks buggy statements higher than the two underlying techniques.
For example, SBIR correctly identifies a buggy statement as the most suspicious
for 18.1% of the defects, while SBFL does so for 10.9% and Blues for 3.1%. We
extend SimFix, a state-of-the-art APR tool, to use SBIR, SBFL, and Blues.
SimFix using SBIR patches 112 out of the 818 defects; 110 when using SBFL, and
55 when using Blues. The 112 patched defects include 55 defects patched
exclusively using SBFL, 7 patched exclusively using IRFL, 47 patched using both
SBFL and IRFL and 3 new defects. SimFix using Blues significantly outperforms
iFixR, the state-of-the-art IRFL-based APR tool. Overall, SimFix using our FL
techniques patches ten defects no prior tools could patch. By evaluating on a
benchmark of 818 defects, 442 previously unused in APR evaluations, we find
that prior evaluations on the overused Defects4J benchmark have led to overly
generous findings. Our paper is the first to (1) use combined FL for APR, (2)
apply a more rigorous methodology for measuring patch correctness, and (3)
evaluate on the new, substantially larger version of Defects4J.Comment: working pape
- …