199 research outputs found
Supporting feature-level software maintenance
Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled
The TopModL Initiative
International audienceWe believe that there is a very strong need for an environment to support research and experiments on model-driven engineering. Therefore we have started the TopModL project, an open-source initiative, with the goal of building a development community to provide: (1) an executable environment for quick and easy experimentation, (2) a set of source files and a compilation tool chain, (3) a web portal to share artefacts developed by the community. The aim of TopModL is to help the model-engineering research community by providing the quickest path between a research idea and a running prototype. In addition, we also want to identify all the possible contributions, understand how to make it easy to integrate existing components, while maintaining architectural integrity. At the time of writing we have almost completed the bootstrap phase (known as Blackhole), which means that we can model TopModL and generate TopModL with TopModL. Beyond this first phase, it is now of paramount importance to gather the best possible description of the requirements of the community involved in model-driven engineering to further develop TopModL, and also to make sure that we are able to reuse or federate existing efforts or goodwill. This paper is more intended to set up a basis for a constructive discussion than to offer definitive answers and closed solutions
Locating bugs without looking back
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history
Performance of IR Models on Duplicate Bug Report Detection: A Comparative Study
Open source projects incorporate bug triagers to help with the task of bug report
assignment to developers. One of the tasks of a triager is to identify whether an incoming
bug report is a duplicate of a pre-existing report. In order to detect duplicate bug reports,
a triager either relies on his memory and experience or on the search capabilties of the bug
repository. Both these approaches can be time consuming for the triager and may also
lead to the misidentication of duplicates. It has also been suggested that duplicate bug
reports are not necessarily harmful, instead they can complement each other to provide
additional information for developers to investigate the defect at hand. This motivates the
need for automated or semi-automated techniques for duplicate bug detection.
In the literature, two main approaches have been proposed to solve this problem. The
first approach is to prevent duplicate reports from reaching developers by automatically
filtering them while the second approach deals with providing the triager a list of top-N
similar bug reports, allowing the triager to compare the incoming bug report with the ones
provided in the list. Previous works have tried to enhance the quality of the suggested
lists, but the approaches either suffered a poor Recall Rate or they incurred additional
runtime overhead, making the deployment of a retrieval system impractical. To the extent
of our knowledge, there has been little work done to do an exhaustive comparison of
the performance of different Information Retrieval Models (especially using more recent
techniques such as topic modeling) on this problem and understanding the effectiveness of
different heuristics across various application domains.
In this thesis, we compare the performance of word based models (derivatives of the
Vector Space Model) such as TF-IDF, Log-Entropy with that of topic based models such as
Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) and Random Indexing
(RI). We leverage heuristics that incorporate exception stack frames, surface features,
summary and long description from the free-form text in the bug report. We perform
experiments on subsets of bug reports from Eclipse and Firefox and achieve a recall rate of
60% and 58% respectively. We find that word based models, in particular a Log-Entropy
based weighting scheme, outperform topic based ones such as LSI and LDA.
Using historical bug data from Eclipse and NetBeans, we determine the optimal time
frame for a desired level of duplicate bug report coverage. We realize an Online Duplicate
Detection Framework that uses a sliding window of a constant time frame as a first step
towards simulating incoming bug reports and recommending duplicates to the end user
Locating Bugs without Looking Back
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, most of state-of-the-art IR approaches rely on project history, in particular previously fixed bugs and previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring is based on heuristics identified through manual inspection of a small set of bug reports. We compare our approach to five others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 28. For example, on average we find one or more affected files in the top 10 ranked files for 77% of the bug reports. These results show the applicability of our approach to software projects without history
Recommended from our members
Improving Information Retrieval Bug Localisation Using Contextual Heuristics
Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code?
Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task.
My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports.
I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the effectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report.
Measuring my approach against six others, using their five metrics and eight projects, I position an effected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool.
Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and offers a simpler light-weight solution
Semantic Component Retrieval in Software Engineering
In the early days of programming the concept of subroutines, and through this software reuse, was invented to spare limited hardware resources. Since then software systems have become increasingly complex and developing them would not have been possible without reusable software elements such as standard libraries and frameworks. Furthermore, other approaches commonly subsumed under the umbrella of software reuse such as product lines and design patterns have become very successful in recent years. However, there are still no software component markets available that would make buying software components as simple as buying parts in a do-it-yourself hardware store and millions of software fragments are still lying un(re)used in configuration management repositories all over the world. The literature primarily blames this on the immense effort required so far to set up and maintain searchable component repositories and the weak mechanisms available for retrieving components from them, resulting in a severe usability problem. In order to address these issues within this thesis, we developed a proactive component reuse recommendation system, naturally integrated into test-first development approaches, which is able to propose semantically appropriate, reusable components according to the specification a developer is just working on. We have implemented an appropriate system as a plugin for the well-known Eclipse IDE and demonstrated its usefulness by carrying out a case study from a popular agile development book. Furthermore, we present a precision analysis for our approach and examples of how components can be retrieved based on a simplified semantics description in terms of standard test cases
TOOL SUPPORT FOR CAPTURING THE ESSENCE OF A CONCERN IN SOURCE CODE
Software evolves constantly to adapt to changing user needs. As it evolves, it becomes progressively harder to understand due to accumulation of code changes, increasing code size, and the introduction of complex code dependencies. As a result, it becomes harder to maintain, exposing the software to potential bugs and degradation of code quality. High maintenance costs and diminished opportunities for software reusability and portability lead to reduced return on investment, increasing the likelihood of the software product being discarded or replaced. Nevertheless, we believe that there is value in legacy software due to the amount of intellectual efforts that have been invested in it. To extend its value, we utilize the common practice of identifying the pieces of code relevant to a given concern. Identifying relevant code is a manual process and relies on domain and code expertise. This makes it difficult to scale to large and complex code. In this thesis, we propose several automated approaches for capturing the essential code that represents a concern of interest. We utilize dynamic program analysis of execution traces to identify a relevant code subset. Information retrieval techniques are then utilized to improve the accuracy of the capture, refine the process, and verify the results
- …