580 research outputs found

    Semantically Enhanced Software Traceability Using Deep Learning Techniques

    Full text link
    In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.Comment: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE

    Extracting and Comparing Concepts Emerging from Software Code, Documentation and Tests

    Get PDF
    Traceability in software engineering is the ability to connect different artifacts that have been built or designed at various points in time. Given the variety of tasks, tools and formats in the software lifecycle, an outstanding challenge for traceability studies is to deal with the heterogeneity of the artifacts, the links between them and the means to extract each. Using a unified approach for extracting keywords from textual information, this paper aims to compare the concepts extracted from three software artifacts: source code, documentation and tests from the same system. The objectives are to detect similarities in the concepts emerged, and to show the degree of alignment and synchronisation the artifacts possess. Using the components of three projects from the Apache Software Foundation, this paper extracts the concepts from 'base' source code, documentation, and tests (separated from the source code). The extraction is done based on the keywords present in each artifact: we then run multiple comparisons (through calculating cosine similarities on features extracted by word embeddings) in order to detect how the sets of concepts are similar or overlap. For similarities between code and tests, we discovered that using pre-trained language models (with increasing dimension and corpus size) correlates to the increase in magnitude, with higher averages and smaller ranges. FastText pre-trained embeddings scored the highest average of 97.33% with the lowest range of 21.8 across all projects. Also, our approach was able to quickly detect outliers, possibly indicating drifts in traceability within modules. For similarities involving documentation, there was a considerable drop in similarity score compared to between code and tests per module - down to below 5%.</p

    EALink: An Efficient and Accurate Pre-trained Framework for Issue-Commit Link Recovery

    Full text link
    Issue-commit links, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to create tags when making commits. Existing studies have deployed deep learning techniques, including pretrained models, to improve automatic issue-commit link recovery.Despite their promising performance, we argue that previous approaches have four main problems, hindering them from recovering links in large software projects. To overcome these problems, we propose an efficient and accurate pre-trained framework called EALink for issue-commit link recovery. EALink requires much fewer model parameters than existing pre-trained methods, bringing efficient training and recovery. Moreover, we design various techniques to improve the recovery accuracy of EALink. We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EALink. Results show that EALink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its training and inference overhead is orders of magnitude lower than existing methods.Comment: 13 pages, 6 figures, published to AS

    What have we learnt from the challenges of (semi-) automated requirements traceability? A discussion on blockchain applicability.

    Get PDF
    Over the last 3 decades, researchers have attempted to shed light into the requirements traceability problem by introducing tracing tools, techniques, and methods with the vision of achieving ubiquitous traceability. Despite the technological advances, requirements traceability remains problematic for researchers and practitioners. This study aims to identify and investigate the main challenges in implementing (semi-)automated requirements traceability, as reported in the recent literature. A systematic literature review was carried out based on the guidelines for systematic literature reviews in software engineering, proposed by Kitchenham. We retrieved 4530 studies by searching five major bibliographic databases and selected 70 primary studies. These studies were analysed and classified according to the challenges they present and/or address. Twenty-one challenges were identified and were classified into five categories. Findings reveal that the most frequent challenges are technological challenges, in particular, low accuracy of traceability recovery methods. Findings also suggest that future research efforts should be devoted to the human facet of tracing, to explore traceability practices in organisational settings, and to develop traceability approaches that support agile and DevOps practices. Finally, it is recommended that researchers leverage blockchain technology as a suitable technical solution to ensure the trustworthiness of traceability information in interorganisational software projects.publishedVersio
    • …
    corecore