423 research outputs found

    Recovering from a Decade: A Systematic Mapping of Information Retrieval Approaches to Software Traceability

    Get PDF
    Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Managing technical debt through software metrics, refactoring and traceability

    Get PDF

    Traceability approach for conflict dissolution in handling requirements crosscutting

    Get PDF
    Requirements crosscutting in software development and maintenance has gradually become an important issue in software engineering. There are growing needs of traceability support to achieve some possible understanding in requirements crosscutting throughout phases in software lifecycle. It is aimed to manage practical process in addressing requirements crosscutting at various phases in order to comply with industrial standard. However, due to its distinct nature, many recent works are focusing on identification, modularization, composition and conflict dissolution of requirements crosscutting which are mostly saturated at requirements level. These works fail to practically specify crosscutting properties for functional and nonfunctional requirements at requirements, analysis and design phases. Therefore, this situation leads to inability to provide sufficient support for software engineers to manage requirements crosscutting across the remaining development phases. This thesis proposes a new approach called the Identification, Modularization, Design Composition Rules and Conflict Dissolutions (IM-DeCRuD) that provides a special traceability to facilitate better understanding and reasoning for engineering tasks towards requirements crosscutting during software development and evolution. This study also promotes a simple but significant way to support pragmatic changes of crosscutting properties at requirements, analysis and design phases for medium sizes of software development and maintenance projects. A tool was developed based on the proposed approach to support four main perspectives namely requirements specification definition, requirements specification modification, requirements prioritization setting and graphics visualizing representation. Software design components are generated using Generic Modeling Environment (GME) with Java language interpreter to incorporate all these features. The proposed IM-DeCRuD was applied to an industrial strength case study of medium-scaled system called myPolicy. The tool was evaluated and the results were verified by some experts for validation and opinion. The feedbacks were then gathered and analyzed using DESMET qualitative method. The outcomes show that the IM-DeCRuD is applicable to address some tedious job of engineering process in handling crosscutting properties at requirements, analysis and design phases for system development and evolution

    Datasets Used in Fifteen Years of Automated Requirements Traceability Research

    Get PDF
    Datasets are crucial to advance automated software traceability research. Acquiring such datasets come in a high cost and require expert knowledge to manually collect and validate them. Obtaining such software development datasets has been one of the most frequently reported barrier for researchers in the software engineering domain in general. This problem is even more acute in field of requirement traceability, which plays crucial role in safety critical and highly regulated systems. Therefore, the main motivation behind this work is to analyze the current state of art of datasets used in the field of software traceability. This work presents a first-of-its-kind literature study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It articulates several attributes related to these datasets such as their characteristics, threats and diversity. Firstly, 202 primary studies (refer Appendix A) were identified for purpose of this study, which were used to derive 73 unique datasets. These 73 datasets were studied in-depth and several attributes (size, type, domain, availability, artifacts) were extracted (refer Appendix B). Based on analysis of the primary studies, a threat to validity reference model, tailored to Software traceability datasets was derived (refer to figure 4.4). Furthermore, to put some light upon the dataset diversity trend in the Software traceability community, a metric called Dataset Diversity Ratio was derived for 38 authors (refer to figure 4.5) who have published more than one publication in field of software traceability

    Information Retrieval based requirement traceability recovery approaches- A systematic literature review

    Get PDF
    Abstract: The term traceability is an important concept regarding software development. It enables software engineers to trace requirements from their origin to fulfillment. Maintaining traceability manually is a time consuming and expensive job. Information retrieval methods provide a mean of automation for requirement traceability. A visible number of IR based traceability techniques have been proposed in the literature, but the adoption of these techniques in the industry is limited. In this paper, we examine the information retrieval-based traceability recovery approaches through systematic literature review. We presented a synthesis of these techniques. We also identified challenges that are potentially limiting the adoption of IR based traceability recovery approaches. We conclude that term mismatch is a major barrier faced by IR based approaches. We also did classify the approaches that are attempting to solve the term mismatch problem

    From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution

    Get PDF
    Software developers in large projects work in complex information landscapes and staying on top of all relevant software artifacts is an acknowledged challenge. As software systems often evolve over many years, a large number of issue reports is typically managed during the lifetime of a system, representing the units of work needed for its improvement, e.g., defects to fix, requested features, or missing documentation. Efficient management of incoming issue reports requires the successful navigation of the information landscape of a project. In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development. Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution. We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting

    Efficient Information Retrieval for Software Bug Localization

    Get PDF
    Software systems are often shipped with defects. When a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. Contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support to minimize the manual effort. IR methods exploit the textual content of bug reports to capture and rank relevant buggy source files. However, for an IR-based bug localization tool to be useful, it must achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. Motivated by these observations, in this dissertation, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods exploit the co-occurrence patterns of code terms in software systems to reveal latent semantic information that other methods often fail to capture. We further investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between software artifacts, can enhance the confidence in each other\u27s results. Furthermore, we propose a novel approach for enhancing the performance of IR-enabled bug localization methods in the context of Open-Source Software (OSS). The proposed approach exploits knowledge from previously resolved bugs to help localize new bugs. Our analysis uses multiple datasets generated for multiple open-source and closed source projects. Our results show that a) information-theoretic IR methods can significantly outperform classical IR methods in bug localization tasks, b) optimized IR-hybrids can significantly outperform individual IR methods, and near-optimal global configurations can be determined for different combinations of IR methods, and c) information extracted from previously resolved bug reports can significantly enhance the accuracy of IR-enabled bug localization methods in OSS
    corecore