1,519 research outputs found
A Comprehensive Empirical Investigation on Failure Clustering in Parallel Debugging
The clustering technique has attracted a lot of attention as a promising
strategy for parallel debugging in multi-fault scenarios, this heuristic
approach (i.e., failure indexing or fault isolation) enables developers to
perform multiple debugging tasks simultaneously through dividing failed test
cases into several disjoint groups. When using statement ranking representation
to model failures for better clustering, several factors influence clustering
effectiveness, including the risk evaluation formula (REF), the number of
faults (NOF), the fault type (FT), and the number of successful test cases
paired with one individual failed test case (NSP1F). In this paper, we present
the first comprehensive empirical study of how these four factors influence
clustering effectiveness. We conduct extensive controlled experiments on 1060
faulty versions of 228 simulated faults and 141 real faults, and the results
reveal that: 1) GP19 is highly competitive across all REFs, 2) clustering
effectiveness decreases as NOF increases, 3) higher clustering effectiveness is
easier to achieve when a program contains only predicate faults, and 4)
clustering effectiveness remains when the scale of NSP1F is reduced to 20%
Test Case Purification for Improving Fault Localization
Finding and fixing bugs are time-consuming activities in software
development. Spectrum-based fault localization aims to identify the faulty
position in source code based on the execution trace of test cases. Failing
test cases and their assertions form test oracles for the failing behavior of
the system under analysis. In this paper, we propose a novel concept of
spectrum driven test case purification for improving fault localization. The
goal of test case purification is to separate existing test cases into small
fractions (called purified test cases) and to enhance the test oracles to
further localize faults. Combining with an original fault localization
technique (e.g., Tarantula), test case purification results in better ranking
the program statements. Our experiments on 1800 faults in six open-source Java
programs show that test case purification can effectively improve existing
fault localization techniques
You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems
Properly benchmarking Automated Program Repair (APR) systems should
contribute to the development and adoption of the research outputs by
practitioners. To that end, the research community must ensure that it reaches
significant milestones by reliably comparing state-of-the-art tools for a
better understanding of their strengths and weaknesses. In this work, we
identify and investigate a practical bias caused by the fault localization (FL)
step in a repair pipeline. We propose to highlight the different fault
localization configurations used in the literature, and their impact on APR
systems when applied to the Defects4J benchmark. Then, we explore the
performance variations that can be achieved by `tweaking' the FL step.
Eventually, we expect to create a new momentum for (1) full disclosure of APR
experimental procedures with respect to FL, (2) realistic expectations of
repairing bugs in Defects4J, as well as (3) reliable performance comparison
among the state-of-the-art APR systems, and against the baseline performance
results of our thoroughly assessed kPAR repair tool. Our main findings include:
(a) only a subset of Defects4J bugs can be currently localized by commonly-used
FL techniques; (b) current practice of comparing state-of-the-art APR systems
(i.e., counting the number of fixed bugs) is potentially misleading due to the
bias of FL configurations; and (c) APR authors do not properly qualify their
performance achievement with respect to the different tuning parameters
implemented in APR systems.Comment: Accepted by ICST 201
An Effective Strategy to Build Up a Balanced Test Suite for Spectrum-Based Fault Localization
During past decades, many automated software faults diagnosis techniques including Spectrum-Based Fault Localization (SBFL) have been proposed to improve the efficiency of software debugging activity. In the field of SBFL, suspiciousness calculation is closely related to the number of failed and passed test cases. Studies have shown that the ratio of the number of failed and passed test case has more significant impact on the accuracy of SBFL than the total number of test cases, and a balanced test suite is more beneficial to improving the accuracy of SBFL. Based on theoretical analysis, we proposed an PNF (Passed test cases, Not execute Faulty statement) strategy to reduce test suite and build up a more balanced one for SBFL, which can be used in regression testing. We evaluated the strategy making experiments using the Siemens program and Space program. Experiments indicated that our PNF strategy can be used to construct a new test suite effectively. Compared with the original test suite, the new one has smaller size (average 90% test case was reduced in experiments) and more balanced ratio of failed test cases to passed test cases, while it has the same statement coverage and fault localization accuracy
- …