8 research outputs found

    You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems

    Get PDF
    Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by `tweaking' the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-the-art APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.Comment: Accepted by ICST 201

    Precise propagation of fault-failure correlations in program flow graphs

    Get PDF
    Statistical fault localization techniques find suspicious faulty program entities in programs by comparing passed and failed executions. Existing studies show that such techniques can be promising in locating program faults. However, coincidental correctness and execution crashes may make program entities indistinguishable in the execution spectra under study, or cause inaccurate counting, thus severely affecting the precision of existing fault localization techniques. In this paper, we propose a BlockRank technique, which calculates, contrasts, and propagates the mean edge profiles between passed and failed executions to alleviate the impact of coincidental correctness. To address the issue of execution crashes, Block-Rank identifies suspicious basic blocks by modeling how each basic block contributes to failures by apportioning their fault relevance to surrounding basic blocks in terms of the rate of successful transition observed from passed and failed executions. BlockRank is empirically shown to be more effective than nine representative techniques on four real-life medium-sized programs. © 2011 IEEE.published_or_final_versionProceedings of the 35th IEEE Annual International Computer Software and Applications Conference (COMPSAC 2011), Munich, Germany, 18-22 July 2011, p. 58-6

    non-parametric statistical fault localization

    No full text
    Fault localization is a major activity in program debugging. To automate this time-consuming task, many existing fault-localization techniques compare passed executions and failed executions, and suggest suspicious program elements, such as predicates or statements, to facilitate the identification of faults. To do that, these techniques propose statistical models and use hypothesis testing methods to test the similarity or dissimilarity of proposed program features between passed and failed executions. Furthermore, when applying their models, these techniques presume that the feature spectra come from populations with specific distributions. The accuracy of using a model to describe feature spectra is related to and may be affected by the underlying distribution of the feature spectra, and the use of a (sound) model on inapplicable circumstances to describe real-life feature spectra may lower the effectiveness of these fault-localization techniques. In this paper, we make use of hypothesis testing methods as the core concept in developing a predicate-based fault-localization framework. We report a controlled experiment to compare, within our framework, the efficacy, scalability, and efficiency of applying three categories of hypothesis testing methods, namely, standard non-parametric hypothesis testing methods, standard parametric hypothesis testing methods, and debugging-specific parametric testing methods. We also conduct a case study to compare the effectiveness of the winner of these three categories with the effectiveness of 33 existing statement-level fault-localization techniques. The experimental results show that the use of non-parametric hypothesis testing methods in our proposed predicate-based fault-localization model is the most promising. © 2011 Elsevier Inc. All rights reserved

    Non-Parametric Statistical Fault Localization

    Get PDF
    Fault localization is a major activity in program debugging. To automate this time-consuming task, many existing fault-localization techniques compare passed executions and failed executions, and suggest suspicious program elements, such as predicates or statements, to facilitate the identification of faults. To do that, these techniques propose statistical models and use hypothesis testing methods to test the similarity or dissimilarity of proposed program features between passed and failed executions. Furthermore, when applying their models, these techniques presume that the feature spectra come from populations with specific distributions. The accuracy of using a model to describe feature spectra is related to and may be affected by the underlying distribution of the feature spectra, and the use of a (sound) model on inapplicable circumstances to describe real-life feature spectra may lower the effectiveness of these fault-localization techniques. In this paper, we make use of hypothesis testing methods as the core concept in developing a predicate-based fault-localization framework. We report a controlled experiment to compare, within our framework, the efficacy, scalability, and efficiency of applying three categories of hypothesis testing methods, namely, standard non-parametric hypothesis testing methods, standard parametric hypothesis testing methods, and debugging-specific parametric testing methods. We also conduct a case study to compare the effectiveness of the winner of these three categories with the effectiveness of 33 existing statement-level fault-localization techniques. The experimental results show that the use of non-parametric hypothesis testing methods in our proposed predicate-based fault-localization model is the most promising
    corecore