406 research outputs found

    Improving Traceability Link Recovery Using Fine-grained Requirements-to-Code Relations

    Get PDF
    Traceability information is a fundamental prerequisite for many essential software maintenance and evolution tasks, such as change impact and software reusability analyses. However, manually generating traceability information is costly and error-prone. Therefore, researchers have developed automated approaches that utilize textual similarities between artifacts to establish trace links. These approaches tend to achieve low precision at reasonable recall levels, as they are not able to bridge the semantic gap between high-level natural language requirements and code. We propose to overcome this limitation by leveraging fine-grained, method and sentence level, similarities between the artifacts for traceability link recovery. Our approach uses word embeddings and a Word Mover\u27s Distance-based similarity to bridge the semantic gap. The fine-grained similarities are aggregated according to the artifacts structure and participate in a majority vote to retrieve coarse-grained, requirement-to-class, trace links. In a comprehensive empirical evaluation, we show that our approach is able to outperform state-of-the-art unsupervised traceability link recovery approaches. Additionally, we illustrate the benefits of fine-grained structural analyses to word embedding-based trace link generation

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    Semantically Enhanced Software Traceability Using Deep Learning Techniques

    Full text link
    In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.Comment: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE

    Assessing Word Similarity Metrics For Traceability Link Recovery

    Get PDF
    Der Softwareentwicklungsprozess involviert oft verschiedene Artefakte, welche jeweils verschiedene Aspekte eines Softwaresystems beschreiben. Traceability Link Recovery ist ein Verfahren, das diesen Entwicklungsprozess unterstützt, indem es verwandte Teile aus verschiedenen Artefakten verbindet. Artefakte, die in natürlicher Sprache ausgedrückt werden, sind schwierig für Maschinen zu verstehen und stellen damit eine besondere Herausforderung für die Traceability Link Recovery dar. Hierfür werden für gewöhnlich Wortähnlichkeitsmetriken eingesetzt, um unterschiedliche Wörter mit gleicher Bedeutung als Synonyme zu identifizieren. ArDoCo ist eine Software, die Wortähnlichkeitsmetriken zum Wiederherstellen von Trace Links zwischen textueller Softwarearchitekturdokumentation und formalen Architekturmodellen einsetzt. Diese Arbeit befasst sich mit dem Einfluss verschiedener Wortähnlichkeitsmetriken auf ArDoCo. Die Wortähnlichkeitsmetriken werden mit mehreren Fallstudien evaluiert. Dazu werden die Metriken Präzision und Sensitivität als auch besondere Herausforderungen der einzelnen Wortähnlichkeitsmetriken als Teil der Evaluation präsentiert

    Efficient Information Retrieval for Software Bug Localization

    Get PDF
    Software systems are often shipped with defects. When a bug is reported, developers use the information available in the associated report to locate source code fragments that need to be modified to fix the bug. However, as software systems evolve in size and complexity, bug localization can become a tedious and time-consuming process. Contemporary bug localization tools utilize Information Retrieval (IR) methods for automated support to minimize the manual effort. IR methods exploit the textual content of bug reports to capture and rank relevant buggy source files. However, for an IR-based bug localization tool to be useful, it must achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. Motivated by these observations, in this dissertation, we propose a new paradigm of information-theoretic IR methods to support bug localization tasks in software systems. These methods exploit the co-occurrence patterns of code terms in software systems to reveal latent semantic information that other methods often fail to capture. We further investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between software artifacts, can enhance the confidence in each other\u27s results. Furthermore, we propose a novel approach for enhancing the performance of IR-enabled bug localization methods in the context of Open-Source Software (OSS). The proposed approach exploits knowledge from previously resolved bugs to help localize new bugs. Our analysis uses multiple datasets generated for multiple open-source and closed source projects. Our results show that a) information-theoretic IR methods can significantly outperform classical IR methods in bug localization tasks, b) optimized IR-hybrids can significantly outperform individual IR methods, and near-optimal global configurations can be determined for different combinations of IR methods, and c) information extracted from previously resolved bug reports can significantly enhance the accuracy of IR-enabled bug localization methods in OSS

    Information Retrieval-Based Optimization Approaches for Requirement Traceability Recovery

    Get PDF
    Requirements traceability provides support for important software engineering activities. Requirements traceability recovery (RTR) is becoming increasingly important due to the numerous benefits to the overall quality of software. Improving the RTR problem has become an active topic of research for software engineers; researchers have proposed a number of approaches for improving and automating RTR across the requirements and the source code of the system. Textual analysis and Information Retrieval (IR) techniques have been applied to the RTR problem for many years; however, most of the existing IR-based methodologies applied to the RTR problem are semiautomatic or time-consuming, even though many links are correctly recovered using IR. Thus, there is a need for effective and innovative approaches for automatization in the RTR problem. In this research, we study IR techniques applied to the RTR problem to determine the optimal alternative to RTR across the textual content of requirements and system source code, and propose innovative methodologies based on computational intelligence combine with IR to achieve automatization. We approach the study of the RTR problem as an optimization problem; the problem is formulated as a multi or mono objective search in which we assign one-to-many relationships between each requirement and source code classes by considering similarity in their textual content. The Non-dominated Sorting Genetic Algorithm (NSGA-II) and Artificial Bee Colony (ABC), when combined with IR techniques, appear to provide promising alternatives for finding a complete and accurate list of traceability links. We adapt the NGSA-II and ABC algorithms to solve the RTR problem, generate programing tools for experimentation, and report the results on three open source projects. Results show values of precision and recall above 70%. NSGA-II and ABC are also analyzed based on time complexity using the big-O notation; results indicate NSGA-II is more time efficient and less precise than ABC
    corecore