Search CORE

2 research outputs found

Fault Localization Using Textual Similarities

Author: Fry Zachary P.
Weimer Westley
Publication venue
Publication date: 12/11/2012
Field of study

Maintenance is a dominant component of software cost, and localizing reported defects is a significant component of maintenance. We propose a scalable approach that leverages the natural language present in both defect reports and source code to identify files that are potentially related to the defect in question. Our technique is language-independent and does not require test cases. The approach represents reports and code as separate structured documents and ranks source files based on a document similarity metric that leverages inter-document relationships. We evaluate the fault-localization accuracy of our method against both lightweight baseline techniques and also reported results from state-of-the-art tools. In an empirical evaluation of 5345 historical defects from programs totaling 6.5 million lines of code, our approach reduced the number of files inspected per defect by over 91%. Additionally, we qualitatively and quantitatively examine the utility of the textual and surface features used by our approach

arXiv.org e-Print Archive

Fault Localization Using Textual Similarities

Author: Zachary P. Fry
Publication venue
Publication date
Field of study

Abstract. Maintenance is a dominant component of software cost, and localizing reported defects so that they can be fixed is a significant component of maintenance. The size and complexity of contemporary systems makes such fault localization difficult, however. In addition, defect reports often contain incomplete information provided by users who may be unfamiliar with the code base. We propose a lightweight and scalable approach that leverages the natural language present in both defect reports and source code to identify portions of the program that are potentially related to the bug in question. Our technique is language independent and does not require test cases. The approach represents defect reports and source files as separate structured document forms and ranks source files of interest based on a document similarity metric that leverages inter-document relationships. We evaluate the fault-localization accuracy of our method against both lightweight baseline techniques and also reported results from stateof-the-art tools. Similar tools have been evaluated using a metric that quantifies the reduction of the overall search space when trying to locate faults. Given information from actual bug reports and their real-world fixes, we utilize a similar metric to gauge the effectiveness of our tool. In an empirical evaluation of 5345 historical defects from three realworld programs totaling 6.5 million lines of code, our approach reduced the number of files inspected per defect by 88%. Additionally, we qualitatively and quantitatively examine the utility of the textual and surface features used by our approach and their implications on conventional defect reporting.

CiteSeerX