734 research outputs found

    Datasets Used in Fifteen Years of Automated Requirements Traceability Research

    Get PDF
    Datasets are crucial to advance automated software traceability research. Acquiring such datasets come in a high cost and require expert knowledge to manually collect and validate them. Obtaining such software development datasets has been one of the most frequently reported barrier for researchers in the software engineering domain in general. This problem is even more acute in field of requirement traceability, which plays crucial role in safety critical and highly regulated systems. Therefore, the main motivation behind this work is to analyze the current state of art of datasets used in the field of software traceability. This work presents a first-of-its-kind literature study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It articulates several attributes related to these datasets such as their characteristics, threats and diversity. Firstly, 202 primary studies (refer Appendix A) were identified for purpose of this study, which were used to derive 73 unique datasets. These 73 datasets were studied in-depth and several attributes (size, type, domain, availability, artifacts) were extracted (refer Appendix B). Based on analysis of the primary studies, a threat to validity reference model, tailored to Software traceability datasets was derived (refer to figure 4.4). Furthermore, to put some light upon the dataset diversity trend in the Software traceability community, a metric called Dataset Diversity Ratio was derived for 38 authors (refer to figure 4.5) who have published more than one publication in field of software traceability

    Managing technical debt through software metrics, refactoring and traceability

    Get PDF

    Information Retrieval based requirement traceability recovery approaches- A systematic literature review

    Get PDF
    Abstract: The term traceability is an important concept regarding software development. It enables software engineers to trace requirements from their origin to fulfillment. Maintaining traceability manually is a time consuming and expensive job. Information retrieval methods provide a mean of automation for requirement traceability. A visible number of IR based traceability techniques have been proposed in the literature, but the adoption of these techniques in the industry is limited. In this paper, we examine the information retrieval-based traceability recovery approaches through systematic literature review. We presented a synthesis of these techniques. We also identified challenges that are potentially limiting the adoption of IR based traceability recovery approaches. We conclude that term mismatch is a major barrier faced by IR based approaches. We also did classify the approaches that are attempting to solve the term mismatch problem

    A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study

    Full text link
    [EN] Context:Commercial video games usually feature an extensive source code and requirements that are related to code lines from multiple methods. Traceability is vital in terms of maintenance and content update, so it is necessary to explore such search spaces properly. Objective:This work presents and evaluates CODFREL (Code Fragment-based Requirement Location), our approach to fine-grained requirement traceability, which lies in an evolutionary algorithm and includes encoding and genetic operators to manipulate code fragments that are built from source code lines. We compare it with a baseline approach (Regular-LSI) by configuring both approaches with different granularities (code lines / complete methods). Method:We evaluated our approach and Regular-LSI in the Kromaia video game case study, which is a commercial video game released on PC and PlayStation 4. The approaches are configured with method and code line granularity and work on 20 requirements that are provided by the development company. Our approach and Regular-LSI calculate similarities between requirements and code fragments or methods to propose possible solutions and, in the case of CODFREL, to guide the evolutionary algorithm. Results:The results, which compare code line and method granularity configurations of CODFREL with different granularity configurations of Regular-LSI, show that our approach outperforms Regular-LSI in precision and recall, with values that are 26 and 8 times better, respectively, even though it does not achieve the optimal solutions. We make an open-source implementation of CODFREL available. Conclusions:Since our approach takes into consideration key issues like the source code size in commercial video games and the requirement dispersion, it provides better starting points than Regular-LSI in the search for solution candidates for the requirements. However, the results and the influence of domain-specific language on them show that more explicit knowledge is required to improve such results.This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R + D + i Plan and ERDF funds under the Project ALPS (RTI2018-096411-B-I00).Blasco, D.; Cetina, C.; Pastor López, O. (2020). A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study. Information and Software Technology. 119:1-12. https://doi.org/10.1016/j.infsof.2019.106235S112119Watkins, R., & Neal, M. (1994). Why and how of requirements tracing. IEEE Software, 11(4), 104-106. doi:10.1109/52.300100Rempel, P., & Mader, P. (2017). Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality. IEEE Transactions on Software Engineering, 43(8), 777-797. doi:10.1109/tse.2016.2622264Borg, M., Runeson, P., & Ardö, A. (2013). Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering, 19(6), 1565-1616. doi:10.1007/s10664-013-9255-yLandauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2-3), 259-284. doi:10.1080/01638539809545028Poshyvanyk, D., Gueheneuc, Y.-G., Marcus, A., Antoniol, G., & Rajlich, V. (2007). Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering, 33(6), 420-432. doi:10.1109/tse.2007.1016Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2011). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25(1), 53-95. doi:10.1002/smr.567Arcuri, A., & Fraser, G. (2013). Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering, 18(3), 594-623. doi:10.1007/s10664-013-9249-9Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77-89. doi:10.1016/s0034-4257(97)00083-7Apache opennlp: Toolkit for the processing of natural language text, 2017, (https://opennlp.apache.org/). [Online; accessed 12-November-2017].P. Abeles, Efficient java matrix library, 2017, (http://ejml.org/). [Online; accessed 9-November-2017].IGDA, International Game Developers Association, 2018.Lucia, A. D., Fasano, F., Oliveto, R., & Tortora, G. (2007). Recovering traceability links in software artifact management systems using information retrieval methods. ACM Transactions on Software Engineering and Methodology, 16(4), 13. doi:10.1145/1276933.1276934De Lucia, A., Oliveto, R., & Tortora, G. (2008). Assessing IR-based traceability recovery tools through controlled experiments. Empirical Software Engineering, 14(1), 57-92. doi:10.1007/s10664-008-9090-8Zou, X., Settimi, R., & Cleland-Huang, J. (2009). Improving automated requirements trace retrieval: a study of term-based enhancement methods. Empirical Software Engineering, 15(2), 119-146. doi:10.1007/s10664-009-9114-zUnterkalmsteiner, M., Gorschek, T., Feldt, R., & Lavesson, N. (2015). Large-scale information retrieval in software engineering - an experience report from industrial application. Empirical Software Engineering, 21(6), 2324-2365. doi:10.1007/s10664-015-9410-8Bavota, G., De Lucia, A., Oliveto, R., & Tortora, G. (2014). Enhancing software artefact traceability recovery processes with link count information. Information and Software Technology, 56(2), 163-182. doi:10.1016/j.infsof.2013.08.00

    Automated recommendation, reuse, and generation of unit tests for software systems

    Get PDF
    This thesis presents a body of work relating to the automated discovery, reuse, and generation of unit tests for software systems with the goal of improving the efficiency of the software engineering process and the quality of the produced software. We start with a novel approach to test-to-code traceability link establishment, called TCTracer, which utilises multilevel information and an ensemble of static and dynamic techniques to achieve state-of-the-art accuracy when establishing links between tests and tested functions and test classes and tested classes. This approach is utilised to provide test-to-code traceability links which facilitate multiple other parts of the work. We then move on to test reuse where we first define an abstract framework, called Rashid, for using connections between artefacts to identify new artefacts for reuse and utilise this framework in Relatest, an approach for producing test recommendations for new functions. Relatest instantiates Rashid by using TCTracer to establish connections between tests and functions and code similarity measures to establish connections between similar functions. This information is used to create lists of recommendations for new functions. We then present an investigation into the automated transplantation of tests which attempts to remove the manual effort required to transform Relatest recommendations and insert them into another project. Finally, we move on to test generation where we utilise neural networks to generate unit test code by learning from existing function-to-test pairs. The first approach, TestNMT, investigates using recurrent neural networks to generate whole JUnit tests and the second approach, ReAssert, utilises a transformer-based architecture to generate JUnit asserts. In total, this thesis addresses the problem by developing approaches for the discovery, reuse, and utilisation of existing functions and tests, including the establishment of relationships between these artefacts, developing mechanisms to aid automated test reuse and learning from existing tests to generate new tests

    Traceability Links Recovery among Requirements and BPMN models

    Full text link
    Tesis por compendio[EN] Throughout the pages of this document, I present the results of the research that was carried out in the context of my PhD studies. During the aforementioned research, I studied the process of Traceability Links Recovery between natural language requirements and industrial software models. More precisely, due to their popularity and extensive usage, I studied the process of Traceability Links Recovery between natural language requirements and Business Process Models, also known as BPMN models. In order to carry out the research, I focused my work on two main objectives: (1) the development of the Traceability Links Recovery techniques between natural language requirements and BPMN models, and (2) the validation and analysis of the results obtained by the developed techniques in industrial domain case studies. The results of the research have been redacted and published in forums, conferences, and journals specialized in the topics and context of the research. This thesis document introduces the topics, context, and objectives of the research, presents the academic publications that have been published as a result of the work, and then discusses the outcomes of the investigation.[ES] A través de las páginas de este documento, presento los resultados de la investigación realizada en el contexto de mis estudios de doctorado. Durante la investigación, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos de software industriales. Más concretamente, debido a su popularidad y uso extensivo, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y Modelos de Procesos de Negocio, también conocidos como modelos BPMN. Para llevar a cabo esta investigación, mi trabajo se ha centrado en dos objetivos principales: (1) desarrollo de técnicas de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos BPMN, y (2) validación y análisis de los resultados obtenidos por las técnicas desarrolladas en casos de estudio de dominios industriales. Los resultados de la investigación han sido redactados y publicados en foros, conferencias y revistas especializadas en los temas y contexto de la investigación. Esta tesis introduce los temas, contexto y objetivos de la investigación, presenta las publicaciones académicas que han sido publicadas como resultado del trabajo, y expone los resultados de la investigación.[CA] A través de les pàgines d'aquest document, presente els resultats de la investigació realitzada en el context dels meus estudis de doctorat. Durant la investigació, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models de programari industrials. Més concretament, a causa de la seua popularitat i ús extensiu, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i Models de Processos de Negoci, també coneguts com a models BPMN. Per a dur a terme aquesta investigació, el meu treball s'ha centrat en dos objectius principals: (1) desenvolupament de tècniques de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models BPMN, i (2) validació i anàlisi dels resultats obtinguts per les tècniques desenvolupades en casos d'estudi de dominis industrials. Els resultats de la investigació han sigut redactats i publicats en fòrums, conferències i revistes especialitzades en els temes i context de la investigació. Aquesta tesi introdueix els temes, context i objectius de la investigació, presenta les publicacions acadèmiques que han sigut publicades com a resultat del treball, i exposa els resultats de la investigació.Lapeña Martí, R. (2020). Traceability Links Recovery among Requirements and BPMN models [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/149391TESISCompendi

    A Modelling Approach to Multi-Domain Traceability

    Get PDF
    Traceability is an important concern in projects that span different engineering domains. Traceability can also be mandated, exploited and man- aged across the engineering lifecycle, and may involve defining connections between heterogeneous models. As a result, traceability can be considered to be multi-domain. This thesis introduces the concept and challenges of multi-domain trace- ability and explains how it can be used to support typical traceability scenarios. It proposes a model-based approach to develop a traceability solution which effectively operates across multiple engineering domains. The approach introduced a collection of tasks and structures which address the identified challenges for a traceability solution in multi-domain projects. The proposed approach demonstrates that modelling principles and MDE techniques can help to address current challenges and consequently improve the effectiveness of a multi-domain traceability solution. A prototype of the required tooling to support the approach is implemented with EMF and atop Epsilon; it consists of an implementation of the proposed structures (models) and model management operations to sup- port traceability. Moreover, the approach is illustrated in the context of two safety-critical projects where multi-domain traceability is required to underpin certification arguments

    Enhancing legacy software system analysis by combining behavioural and semantic information sources

    Get PDF
    Computer software is, by its very nature highly complex and invisible yet subject to a near-continual pressure to change. Over time the development process has become more mature and less risky. This is in large part due to the concept of software traceability; the ability to relate software components back to their initial requirements and between each other. Such traceability aids tasks such as maintenance by facilitating the prediction of “ripple effects” that may result, and aiding comprehension of software structures in general. Many organisations, however, have large amounts of software for which little or no documentation exists; the original developers are no longer available and yet this software still underpins critical functions. Such “legacy software” can therefore represent a high risk when changes are required. Consequently, large amounts of effort go into attempting to comprehend and understand legacy software. The most common way to accomplish this, given that reading the code directly is hugely time consuming and near-impossible, is to reverse engineer the code, usually to a form of representative projection such as a UML class diagram. Although a wide number of tools and approaches exist, there is no empirical way to compare them or validate new developments. Consequently there was an identified need to define and create the Reverse Engineering to Design Benchmark (RED-BM). This was then applied to a number of industrial tools. The measured performance of these tools varies from 8.8% to 100%, demonstrating both the effectiveness of the benchmark and the questionable performance of several tools. In addition to the structural relationships detectable through static reverse engineering, other sources of information are available with the potential to reveal other types of relationships such as semantic links. One such source is the mining of source code repositories which can be analysed to find components within a software system that have, historically, commonly been changed together during the evolution of the system and from the strength of that infer a semantic link. An approach was implemented to mine such semantic relationships from repositories and relationships were found beyond those expressed by static reverse engineering. These included groups of relationships potentially suitable for clustering. To allow for the general use of multiple information sources to build traceability links between software components a uniform approach was defined and illustrated. This includes rules and formulas to allow combination of sources. The uniform approach was implemented in the field of predictive change impact analysis using reverse engineering and repository mining as information sources. This implementation, the Java Code Relationship Anlaysis (jcRA) package, was then evaluated against an industry standard tool, JRipples. Depending on the target, the combined approach is able to outperform JRipples in detecting potential impacts with the risk of over-matching (a high number of false-positives and overall class coverage on some targets)

    Towards an Intelligent System for Software Traceability Datasets Generation

    Get PDF
    Software datasets and artifacts play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. Software artifacts, other than source code and issue tracking entities, can also provide a great deal of insight into a software system and facilitate knowledge sharing and information reuse. The diversity and quality of the datasets and artifacts within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. In this dissertation, we report our empirical work that aims to automatically generate and assess the quality of such datasets. Our goal is to introduce an intelligent system that can help researchers in the domain of software traceability in obtaining high-quality “training sets”, “testing sets” or appropriate “case studies” from open source repositories based on their needs. In the first project, we present a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Second, this dissertation introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used. Third, we present the results of an empirical study with limited scope to generate datasets using three baseline approaches for the creation of training data. These approaches are (i) Expert-Based, (ii) Automated Web-Mining, which generates training sets by automatically mining tactic\u27s APIs from technical programming websites, and lastly, (iii) Automated Big-Data Analysis, which mines ultra-large-scale code repositories to generate training sets. We compare the trace-link creation accuracy achieved using each of these three baseline approaches and discuss the costs and benefits associated with them. Additionally, in a separate study, we investigate the impact of training set size on the accuracy of recovering trace links. Finally, we conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts. Finally, we conducted a study to understand how software traceability experts and practitioners evaluate the quality of their datasets. In addition, we aim at gathering experts’ opinions on all quality attributes and metrics proposed by T-DQA
    corecore