199 research outputs found

    Supporting feature-level software maintenance

    Get PDF
    Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled

    The TopModL Initiative

    Get PDF
    International audienceWe believe that there is a very strong need for an environment to support research and experiments on model-driven engineering. Therefore we have started the TopModL project, an open-source initiative, with the goal of building a development community to provide: (1) an executable environment for quick and easy experimentation, (2) a set of source files and a compilation tool chain, (3) a web portal to share artefacts developed by the community. The aim of TopModL is to help the model-engineering research community by providing the quickest path between a research idea and a running prototype. In addition, we also want to identify all the possible contributions, understand how to make it easy to integrate existing components, while maintaining architectural integrity. At the time of writing we have almost completed the bootstrap phase (known as Blackhole), which means that we can model TopModL and generate TopModL with TopModL. Beyond this first phase, it is now of paramount importance to gather the best possible description of the requirements of the community involved in model-driven engineering to further develop TopModL, and also to make sure that we are able to reuse or federate existing efforts or goodwill. This paper is more intended to set up a basis for a constructive discussion than to offer definitive answers and closed solutions

    Locating bugs without looking back

    Get PDF
    Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history

    Performance of IR Models on Duplicate Bug Report Detection: A Comparative Study

    Get PDF
    Open source projects incorporate bug triagers to help with the task of bug report assignment to developers. One of the tasks of a triager is to identify whether an incoming bug report is a duplicate of a pre-existing report. In order to detect duplicate bug reports, a triager either relies on his memory and experience or on the search capabilties of the bug repository. Both these approaches can be time consuming for the triager and may also lead to the misidentication of duplicates. It has also been suggested that duplicate bug reports are not necessarily harmful, instead they can complement each other to provide additional information for developers to investigate the defect at hand. This motivates the need for automated or semi-automated techniques for duplicate bug detection. In the literature, two main approaches have been proposed to solve this problem. The first approach is to prevent duplicate reports from reaching developers by automatically filtering them while the second approach deals with providing the triager a list of top-N similar bug reports, allowing the triager to compare the incoming bug report with the ones provided in the list. Previous works have tried to enhance the quality of the suggested lists, but the approaches either suffered a poor Recall Rate or they incurred additional runtime overhead, making the deployment of a retrieval system impractical. To the extent of our knowledge, there has been little work done to do an exhaustive comparison of the performance of different Information Retrieval Models (especially using more recent techniques such as topic modeling) on this problem and understanding the effectiveness of different heuristics across various application domains. In this thesis, we compare the performance of word based models (derivatives of the Vector Space Model) such as TF-IDF, Log-Entropy with that of topic based models such as Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) and Random Indexing (RI). We leverage heuristics that incorporate exception stack frames, surface features, summary and long description from the free-form text in the bug report. We perform experiments on subsets of bug reports from Eclipse and Firefox and achieve a recall rate of 60% and 58% respectively. We find that word based models, in particular a Log-Entropy based weighting scheme, outperform topic based ones such as LSI and LDA. Using historical bug data from Eclipse and NetBeans, we determine the optimal time frame for a desired level of duplicate bug report coverage. We realize an Online Duplicate Detection Framework that uses a sliding window of a constant time frame as a first step towards simulating incoming bug reports and recommending duplicates to the end user

    Locating Bugs without Looking Back

    Get PDF
    Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, most of state-of-the-art IR approaches rely on project history, in particular previously fixed bugs and previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring is based on heuristics identified through manual inspection of a small set of bug reports. We compare our approach to five others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 28. For example, on average we find one or more affected files in the top 10 ranked files for 77% of the bug reports. These results show the applicability of our approach to software projects without history

    Semantic Component Retrieval in Software Engineering

    Get PDF
    In the early days of programming the concept of subroutines, and through this software reuse, was invented to spare limited hardware resources. Since then software systems have become increasingly complex and developing them would not have been possible without reusable software elements such as standard libraries and frameworks. Furthermore, other approaches commonly subsumed under the umbrella of software reuse such as product lines and design patterns have become very successful in recent years. However, there are still no software component markets available that would make buying software components as simple as buying parts in a do-it-yourself hardware store and millions of software fragments are still lying un(re)used in configuration management repositories all over the world. The literature primarily blames this on the immense effort required so far to set up and maintain searchable component repositories and the weak mechanisms available for retrieving components from them, resulting in a severe usability problem. In order to address these issues within this thesis, we developed a proactive component reuse recommendation system, naturally integrated into test-first development approaches, which is able to propose semantically appropriate, reusable components according to the specification a developer is just working on. We have implemented an appropriate system as a plugin for the well-known Eclipse IDE and demonstrated its usefulness by carrying out a case study from a popular agile development book. Furthermore, we present a precision analysis for our approach and examples of how components can be retrieved based on a simplified semantics description in terms of standard test cases

    TOOL SUPPORT FOR CAPTURING THE ESSENCE OF A CONCERN IN SOURCE CODE

    Get PDF
    Software evolves constantly to adapt to changing user needs. As it evolves, it becomes progressively harder to understand due to accumulation of code changes, increasing code size, and the introduction of complex code dependencies. As a result, it becomes harder to maintain, exposing the software to potential bugs and degradation of code quality. High maintenance costs and diminished opportunities for software reusability and portability lead to reduced return on investment, increasing the likelihood of the software product being discarded or replaced. Nevertheless, we believe that there is value in legacy software due to the amount of intellectual efforts that have been invested in it. To extend its value, we utilize the common practice of identifying the pieces of code relevant to a given concern. Identifying relevant code is a manual process and relies on domain and code expertise. This makes it difficult to scale to large and complex code. In this thesis, we propose several automated approaches for capturing the essential code that represents a concern of interest. We utilize dynamic program analysis of execution traces to identify a relevant code subset. Information retrieval techniques are then utilized to improve the accuracy of the capture, refine the process, and verify the results
    • …
    corecore