1,302 research outputs found

    Variable-Based Fault Localization via Enhanced Decision Tree

    Full text link
    Fault localization, aiming at localizing the root cause of the bug under repair, has been a longstanding research topic. Although many approaches have been proposed in the last decades, most of the existing studies work at coarse-grained statement or method levels with very limited insights about how to repair the bug (granularity problem), but few studies target the finer-grained fault localization. In this paper, we target the granularity problem and propose a novel finer-grained variable-level fault localization technique. Specifically, we design a program-dependency-enhanced decision tree model to boost the identification of fault-relevant variables via discriminating failed and passed test cases based on the variable values. To evaluate the effectiveness of our approach, we have implemented it in a tool called VARDT and conducted an extensive study over the Defects4J benchmark. The results show that VARDT outperforms the state-of-the-art fault localization approaches with at least 247.8% improvements in terms of bugs located at Top-1, and the average improvements are 330.5%. Besides, to investigate whether our finer-grained fault localization result can further improve the effectiveness of downstream APR techniques, we have adapted VARDT to the application of patch filtering, where VARDT outperforms the state-of-the-art PATCH-SIM by filtering 26.0% more incorrect patches. The results demonstrate the effectiveness of our approach and it also provides a new way of thinking for improving automatic program repair techniques

    Mutation-based Fault Localization of Deep Neural Networks

    Full text link
    Deep neural networks (DNNs) are susceptible to bugs, just like other types of software systems. A significant uptick in using DNN, and its applications in wide-ranging areas, including safety-critical systems, warrant extensive research on software engineering tools for improving the reliability of DNN-based systems. One such tool that has gained significant attention in the recent years is DNN fault localization. This paper revisits mutation-based fault localization in the context of DNN models and proposes a novel technique, named deepmufl, applicable to a wide range of DNN models. We have implemented deepmufl and have evaluated its effectiveness using 109 bugs obtained from StackOverflow. Our results show that deepmufl detects 53/109 of the bugs by ranking the buggy layer in top-1 position, outperforming state-of-the-art static and dynamic DNN fault localization systems that are also designed to target the class of bugs supported by deepmufl. Moreover, we observed that we can halve the fault localization time for a pre-trained model using mutation selection, yet losing only 7.55% of the bugs localized in top-1 position.Comment: 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Automatically Repairing Programs Using Both Tests and Bug Reports

    Full text link
    The success of automated program repair (APR) depends significantly on its ability to localize the defects it is repairing. For fault localization (FL), APR tools typically use either spectrum-based (SBFL) techniques that use test executions or information-retrieval-based (IRFL) techniques that use bug reports. These two approaches often complement each other, patching different defects. No existing repair tool uses both SBFL and IRFL. We develop RAFL (Rank-Aggregation-Based Fault Localization), a novel FL approach that combines multiple FL techniques. We also develop Blues, a new IRFL technique that uses bug reports, and an unsupervised approach to localize defects. On a dataset of 818 real-world defects, SBIR (combined SBFL and Blues) consistently localizes more bugs and ranks buggy statements higher than the two underlying techniques. For example, SBIR correctly identifies a buggy statement as the most suspicious for 18.1% of the defects, while SBFL does so for 10.9% and Blues for 3.1%. We extend SimFix, a state-of-the-art APR tool, to use SBIR, SBFL, and Blues. SimFix using SBIR patches 112 out of the 818 defects; 110 when using SBFL, and 55 when using Blues. The 112 patched defects include 55 defects patched exclusively using SBFL, 7 patched exclusively using IRFL, 47 patched using both SBFL and IRFL and 3 new defects. SimFix using Blues significantly outperforms iFixR, the state-of-the-art IRFL-based APR tool. Overall, SimFix using our FL techniques patches ten defects no prior tools could patch. By evaluating on a benchmark of 818 defects, 442 previously unused in APR evaluations, we find that prior evaluations on the overused Defects4J benchmark have led to overly generous findings. Our paper is the first to (1) use combined FL for APR, (2) apply a more rigorous methodology for measuring patch correctness, and (3) evaluate on the new, substantially larger version of Defects4J.Comment: working pape

    Back to the Future! Studying Data Cleanness in Defects4J and its Impact on Fault Localization

    Full text link
    For software testing research, Defects4J stands out as the primary benchmark dataset, offering a controlled environment to study real bugs from prominent open-source systems. However, prior research indicates that Defects4J might include tests added post-bug report, embedding developer knowledge and affecting fault localization efficacy. In this paper, we examine Defects4J's fault-triggering tests, emphasizing the implications of developer knowledge of SBFL techniques. We study the timelines of changes made to these tests concerning bug report creation. Then, we study the effectiveness of SBFL techniques without developer knowledge in the tests. We found that 1) 55% of the fault-triggering tests were newly added to replicate the bug or to test for regression; 2) 22% of the fault-triggering tests were modified after the bug reports were created, containing developer knowledge of the bug; 3) developers often modify the tests to include new assertions or change the test code to reflect the changes in the source code; and 4) the performance of SBFL techniques degrades significantly (up to --415% for Mean First Rank) when evaluated on the bugs without developer knowledge. We provide a dataset of bugs without developer insights, aiding future SBFL evaluations in Defects4J and informing considerations for future bug benchmarks

    Directed Test Program Generation for JIT Compiler Bug Localization

    Full text link
    Bug localization techniques for Just-in-Time (JIT) compilers are based on analyzing the execution behaviors of the target JIT compiler on a set of test programs generated for this purpose; characteristics of these test inputs can significantly impact the accuracy of bug localization. However, current approaches for automatic test program generation do not work well for bug localization in JIT compilers. This paper proposes a novel technique for automatic test program generation for JIT compiler bug localization that is based on two key insights: (1) the generated test programs should contain both passing inputs (which do not trigger the bug) and failing inputs (which trigger the bug); and (2) the passing inputs should be as similar as possible to the initial seed input, while the failing programs should be as different as possible from it. We use a structural analysis of the seed program to determine which parts of the code should be mutated for each of the passing and failing cases. Experiments using a prototype implementation indicate that test inputs generated using our approach result in significantly improved bug localization results than existing approaches

    Large Language Models in Fault Localisation

    Full text link
    Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, code summarisation, test generation and code repair. Fault localisation is essential for facilitating automatic program debugging and repair, and is demonstrated as a highlight at ChatGPT-4's launch event. Nevertheless, there has been little work understanding LLMs' capabilities for fault localisation in large-scale open-source programs. To fill this gap, this paper presents an in-depth investigation into the capability of ChatGPT-3.5 and ChatGPT-4, the two state-of-the-art LLMs, on fault localisation. Using the widely-adopted Defects4J dataset, we compare the two LLMs with the existing fault localisation techniques. We also investigate the stability and explanation of LLMs in fault localisation, as well as how prompt engineering and the length of code context affect the fault localisation effectiveness. Our findings demonstrate that within a limited code context, ChatGPT-4 outperforms all the existing fault localisation methods. Additional error logs can further improve ChatGPT models' localisation accuracy and stability, with an average 46.9% higher accuracy over the state-of-the-art baseline SmartFL in terms of TOP-1 metric. However, performance declines dramatically when the code context expands to the class-level, with ChatGPT models' effectiveness becoming inferior to the existing methods overall. Additionally, we observe that ChatGPT's explainability is unsatisfactory, with an accuracy rate of only approximately 30%. These observations demonstrate that while ChatGPT can achieve effective fault localisation performance under certain conditions, evident limitations exist. Further research is imperative to fully harness the potential of LLMs like ChatGPT for practical fault localisation applications
    • …
    corecore