6,267 research outputs found

    Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning-to-Rank Approach

    Get PDF
    Bug localization is one of the most important stages of the bug fixing process. Bad practices make the debugging a tedious task. Investigating bugs can contribute up to a large portion of the aggregate cost for a software project. An automated strategy that can provide a ranked list of source code files with respect to how likely they contain the root cause of the problem would help the development teams to decrease the search space and leads to increase in the productivity. In this work, I have replicated the bug localization approach presented in \cite{ye2014learning} that applies the learning-to-rank technique to rank the relevant files for each bug. This technique applies domain knowledge by evaluating the textual similarity between bug reports and source code files and API specification documents plus bug fixing and code alteration history. For a given bug report, the ranking function is constructed based on the linear combination of weighted features where the features are trained on previously solved bug reports. In addition to replication of the mentioned technique, I have extended the study by evaluating the role of different text preprocessing techniques such as Stemming and Lemmatization and also a randomized selection of training folds on the overall performance of the ranking model. I found that Lemmatization of the words and randomized selection of the training folds have an adverse effect on the performance of the ranking model and consequently having lower accuracy and precision of the results

    Locating bugs without looking back

    Get PDF
    Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history

    Big Data: Learning, Analytics, and Applications

    Get PDF
    With the rise of autonomous systems, the automation of faults detection and localization becomes critical to their reliability. An automated strategy that can provide a ranked list of faulty modules or files with respect to how likely they contain the root cause of the problem would help in the automation bug localization. Learning from the history if previously located bugs in general, and extracting the dependencies between these bugs in particular, helps in building models to accurately localize any potentially detected bugs. In this study, we propose a novel fault localization solution based on a learning-to-rank strategy, using the history of previously localized bugs and their dependencies as features, to rank files in terms of their likelihood of being a root cause of a bug. The evaluation of our approach has shown its efficiency in localizing dependent bugs
    • …
    corecore