Search CORE

105 research outputs found

Efficient Information Retrieval for Software Bug Localization

Author: Khatiwada Saket
Publication venue: LSU Digital Commons
Publication date: 12/03/2022
Field of study

Software systems are often shipped with defects. When a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. Contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support to minimize the manual effort. IR methods exploit the textual content of bug reports to capture and rank relevant buggy source files. However, for an IR-based bug localization tool to be useful, it must achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. Motivated by these observations, in this dissertation, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods exploit the co-occurrence patterns of code terms in software systems to reveal latent semantic information that other methods often fail to capture. We further investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between software artifacts, can enhance the confidence in each other\u27s results. Furthermore, we propose a novel approach for enhancing the performance of IR-enabled bug localization methods in the context of Open-Source Software (OSS). The proposed approach exploits knowledge from previously resolved bugs to help localize new bugs. Our analysis uses multiple datasets generated for multiple open-source and closed source projects. Our results show that a) information-theoretic IR methods can significantly outperform classical IR methods in bug localization tasks, b) optimized IR-hybrids can significantly outperform individual IR methods, and near-optimal global configurations can be determined for different combinations of IR methods, and c) information extracted from previously resolved bug reports can significantly enhance the accuracy of IR-enabled bug localization methods in OSS

Louisiana State University

Locating bugs without looking back

Author: CD Manning
D Poshyvanyk
EM Voorhees
G Antoniol
G Salton
J Sillito
M Petrenko
MF Porter
Michel Wermelinger
N Wilde
T Zimmermann
Tezcan Dilshener
Yijun Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/10/2017
Field of study

Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history

Crossref

Open Research Online (The Open University)

Recommended from our members

Improving Information Retrieval Bug Localisation Using Contextual Heuristics

Author: Dilshener Tezcan
Publication venue
Publication date: 06/06/2017
Field of study

Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task. My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports. I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the effectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report. Measuring my approach against six others, using their five metrics and eight projects, I position an effected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool. Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and offers a simpler light-weight solution

Open Research Online (The Open University)

Tracelab: Reproducing Empirical Software Engineering Research

Author: Moritz Evan Alexander
Publication venue: W&M ScholarWorks
Publication date: 01/01/2013
Field of study

College of William & Mary: W&M Publish

Automatic Recall of Lessons Learned for Software Project Managers

Author: Mohamed Tamer Mohamed Abdellatif
Publication venue: Scholarship@Western
Publication date: 27/08/2018
Field of study

Lessons learned (LL) records constitute a software organization’s memory of successes and failures. LL are recorded within the organization repository for future reference to optimize planning, gain experience, and elevate market competitiveness. However, manually searching this repository is a daunting task, so it is often overlooked. This can lead to the repetition of previous mistakes and missing potential opportunities, which, in turn, can negatively affect the organization’s profitability and competitiveness. In this thesis, we present a novel solution that provides an automatic process to recall relevant LL and to push them to project managers. This substantially reduces the amount of time and effort required to manually search the unstructured LL repositories, and therefore, it encourages the utilization of LL. In this study, we exploit existing project artifacts to build the LL search queries on-the-fly, in order to bypass the tedious manual search process. While most of the current LL recall studies rely on case-based reasoning, they have some limitations including the need to reformat the LL repository, which is impractical, and the need for tight user involvement. This makes us the first to employ information retrieval (IR) to address the LL recall. An empirical study has been conducted to build the automatic LL recall solution and evaluate its effectiveness. In our study, we employ three of the most popular IR models to construct a solution that considers multiple classifier configurations. In addition, we have extended this study by examining the impact of the hybridization of LL classifiers on the classifiers’ performance. Furthermore, a real-world dataset of 212 LL records from 30 different software projects has been used for validation. Top-k and MAP, well-known accuracy metrics, have been used as well. The study results confirm the effectiveness of the automatic LL recall solution by a discerning accuracy of about 70%, which was increased to 74% in the case of hybridization. This eliminates the effort needed to manually search the LL repository, which positively encourages project managers to reuse the available LL knowledge – which in turn avoids old pitfalls and unleash hidden business opportunities

Scholarship@Western

Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports

Author: LO David
ZHANG Hongyu
ZHOU Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

NSF

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

On The Relationship Between The Vocabulary Of Bug Reports And Source Code

Author: Bandara Amunugamage Buddhini Wathsala
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2013
Field of study

The use of text retrieval techniques on concept location and bug localization yields remarkable benefits. The artifacts found in source code and bug reports contain important information related to the bug localization process. When locating the bugs, it is a programmer\u27s task to formulate effective queries such that most of the predicted terms in the query appear in the relevant defect code, but not in most of the non-relevant source files. These queries are built based on the textual content found in the bug reports, especially the bug title and the description. A large body of research uses bug descriptions to evaluate bug localization techniques using text retrieval. All these studies are conducted under the implicit assumption that the bug description and the relevant source code files share important terms. This paper presents an empirical study that explores this conjecture. We found that bug reports share more terms with the patched classes than with the other classes in the software system. Moreover, the study revealed that the class names are more likely to share terms with the bug descriptions than other code locations. We also found that more verbose parts of the source code, such as, comments share more words. Furthermore, we discovered that the shared terms may be better predictors for bug localization than some other text retrieval techniques, such as, LSI

Digital Commons@Wayne State University

Query expansion using novel use case scenario relationship for finding feature location

Author: Arwan Achmad
Fatichah Chastine
Rochimah Siti
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

Feature location is a technique for determining source code that implements specific features in software. It developed to help minimize effort on program comprehension. The main challenge of feature location research is how to bridge the gap between abstract keywords in use cases and detail in source code. The use case scenarios are software requirements artifacts that state the input, logic, rules, actor, and output of a function in the software. The sentence on use case scenario is sometimes described another sentence in other use case scenario. This study contributes to creating expansion queries in feature locations by finding the relationship between use case scenarios. The relationships include inner association, outer association and intratoken association. The research employs latent Dirichlet allocation (LDA) to create model topics on source code. Query expansion using inner, outer and intratoken was tested for finding feature locations on a Java-based open-source project. The best precision rate was 50%. The best recall was 100%, which was found in several use case scenarios implemented in a few files. The best average precision rate was 16.7%, which was found in inner association experiments. The best average recall rate was 68.3%, which was found in all compound association experiments

Institute of Advanced Engineering and Science