224 research outputs found

    Exploiting Parts-of-Speech for Effective Automated Requirements Traceability

    Get PDF
    Context: Requirement traceability (RT) is defined as the ability to describe and follow the life of a requirement. RT helps developers ensure that relevant requirements are implemented and that the source code is consistent with its requirement with respect to a set of traceability links called trace links. Previous work leverages Parts Of Speech (POS) tagging of software artifacts to recover trace links among them. These studies work on the premise that discarding one or more POS tags results in an improved accuracy of Information Retrieval (IR) techniques. Objective: First, we show empirically that excluding one or more POS tags could negatively impact the accuracy of existing IR-based traceability approaches, namely the Vector Space Model (VSM) and the Jensen Shannon Model (JSM). Second, we propose a method that improves the accuracy of IR-based traceability approaches. Method: We developed an approach, called ConPOS, to recover trace links using constraint-based pruning. ConPOS uses major POS categories and applies constraints to the recovered trace links for pruning as a filtering process to significantly improve the effectiveness of IR-based techniques. We conducted an experiment to provide evidence that removing POSs does not improve the accuracy of IR techniques. Furthermore, we conducted two empirical studies to evaluate the effectiveness of ConPOS in recovering trace links compared to existing peer RT approaches. Results: The results of the first empirical study show that removing one or more POS negatively impacts the accuracy of VSM and JSM. Furthermore, the results from the other empirical studies show that ConPOS provides 11%-107%, 8%-64%, and 15%-170% higher precision, recall, and mean average precision (MAP) than VSM and JSM. Conclusion: We showed that ConPosout performs existing IR-based RT approaches that discard some POS tags from the input documents

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Traceability Links Recovery among Requirements and BPMN models

    Full text link
    Tesis por compendio[EN] Throughout the pages of this document, I present the results of the research that was carried out in the context of my PhD studies. During the aforementioned research, I studied the process of Traceability Links Recovery between natural language requirements and industrial software models. More precisely, due to their popularity and extensive usage, I studied the process of Traceability Links Recovery between natural language requirements and Business Process Models, also known as BPMN models. In order to carry out the research, I focused my work on two main objectives: (1) the development of the Traceability Links Recovery techniques between natural language requirements and BPMN models, and (2) the validation and analysis of the results obtained by the developed techniques in industrial domain case studies. The results of the research have been redacted and published in forums, conferences, and journals specialized in the topics and context of the research. This thesis document introduces the topics, context, and objectives of the research, presents the academic publications that have been published as a result of the work, and then discusses the outcomes of the investigation.[ES] A través de las páginas de este documento, presento los resultados de la investigación realizada en el contexto de mis estudios de doctorado. Durante la investigación, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos de software industriales. Más concretamente, debido a su popularidad y uso extensivo, he estudiado el proceso de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y Modelos de Procesos de Negocio, también conocidos como modelos BPMN. Para llevar a cabo esta investigación, mi trabajo se ha centrado en dos objetivos principales: (1) desarrollo de técnicas de Recuperación de Enlaces de Trazabilidad entre requisitos especificados en lenguaje natural y modelos BPMN, y (2) validación y análisis de los resultados obtenidos por las técnicas desarrolladas en casos de estudio de dominios industriales. Los resultados de la investigación han sido redactados y publicados en foros, conferencias y revistas especializadas en los temas y contexto de la investigación. Esta tesis introduce los temas, contexto y objetivos de la investigación, presenta las publicaciones académicas que han sido publicadas como resultado del trabajo, y expone los resultados de la investigación.[CA] A través de les pàgines d'aquest document, presente els resultats de la investigació realitzada en el context dels meus estudis de doctorat. Durant la investigació, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models de programari industrials. Més concretament, a causa de la seua popularitat i ús extensiu, he estudiat el procés de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i Models de Processos de Negoci, també coneguts com a models BPMN. Per a dur a terme aquesta investigació, el meu treball s'ha centrat en dos objectius principals: (1) desenvolupament de tècniques de Recuperació d'Enllaços de Traçabilitat entre requisits especificats en llenguatge natural i models BPMN, i (2) validació i anàlisi dels resultats obtinguts per les tècniques desenvolupades en casos d'estudi de dominis industrials. Els resultats de la investigació han sigut redactats i publicats en fòrums, conferències i revistes especialitzades en els temes i context de la investigació. Aquesta tesi introdueix els temes, context i objectius de la investigació, presenta les publicacions acadèmiques que han sigut publicades com a resultat del treball, i exposa els resultats de la investigació.Lapeña Martí, R. (2020). Traceability Links Recovery among Requirements and BPMN models [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/149391TESISCompendi

    Recovering from a Decade: A Systematic Mapping of Information Retrieval Approaches to Software Traceability

    Get PDF
    Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery

    IR-based traceability recovery as a plugin - an industrial case study

    Get PDF
    Large-scale software development is a complex undertaking and generates an ever-increasing amount of information. To be able to work efficiently under such circumstances, navigation in all available data needs support. Maintaining traceability links between software artefacts is one approach to structure the information space and support this challenge. Several researchers have proposed traceability recovery by applying IR methods, based on textual similarities between artefacts. Early studies have shown promising results, but no large-scale in vivo evaluations have been made. Currently, there is a trend among our industrial partners to collect artefacts in a specific new software engineering tool. Our goal is to develop an IR-based traceability recovery plugin to this tool. From this position, in the environment of possible future users, the usefulness of supported findability in a software engineering context could be explored with an industrial validity

    Datasets Used in Fifteen Years of Automated Requirements Traceability Research

    Get PDF
    Datasets are crucial to advance automated software traceability research. Acquiring such datasets come in a high cost and require expert knowledge to manually collect and validate them. Obtaining such software development datasets has been one of the most frequently reported barrier for researchers in the software engineering domain in general. This problem is even more acute in field of requirement traceability, which plays crucial role in safety critical and highly regulated systems. Therefore, the main motivation behind this work is to analyze the current state of art of datasets used in the field of software traceability. This work presents a first-of-its-kind literature study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It articulates several attributes related to these datasets such as their characteristics, threats and diversity. Firstly, 202 primary studies (refer Appendix A) were identified for purpose of this study, which were used to derive 73 unique datasets. These 73 datasets were studied in-depth and several attributes (size, type, domain, availability, artifacts) were extracted (refer Appendix B). Based on analysis of the primary studies, a threat to validity reference model, tailored to Software traceability datasets was derived (refer to figure 4.4). Furthermore, to put some light upon the dataset diversity trend in the Software traceability community, a metric called Dataset Diversity Ratio was derived for 38 authors (refer to figure 4.5) who have published more than one publication in field of software traceability

    Information Retrieval based requirement traceability recovery approaches- A systematic literature review

    Get PDF
    Abstract: The term traceability is an important concept regarding software development. It enables software engineers to trace requirements from their origin to fulfillment. Maintaining traceability manually is a time consuming and expensive job. Information retrieval methods provide a mean of automation for requirement traceability. A visible number of IR based traceability techniques have been proposed in the literature, but the adoption of these techniques in the industry is limited. In this paper, we examine the information retrieval-based traceability recovery approaches through systematic literature review. We presented a synthesis of these techniques. We also identified challenges that are potentially limiting the adoption of IR based traceability recovery approaches. We conclude that term mismatch is a major barrier faced by IR based approaches. We also did classify the approaches that are attempting to solve the term mismatch problem

    A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study

    Full text link
    [EN] Context:Commercial video games usually feature an extensive source code and requirements that are related to code lines from multiple methods. Traceability is vital in terms of maintenance and content update, so it is necessary to explore such search spaces properly. Objective:This work presents and evaluates CODFREL (Code Fragment-based Requirement Location), our approach to fine-grained requirement traceability, which lies in an evolutionary algorithm and includes encoding and genetic operators to manipulate code fragments that are built from source code lines. We compare it with a baseline approach (Regular-LSI) by configuring both approaches with different granularities (code lines / complete methods). Method:We evaluated our approach and Regular-LSI in the Kromaia video game case study, which is a commercial video game released on PC and PlayStation 4. The approaches are configured with method and code line granularity and work on 20 requirements that are provided by the development company. Our approach and Regular-LSI calculate similarities between requirements and code fragments or methods to propose possible solutions and, in the case of CODFREL, to guide the evolutionary algorithm. Results:The results, which compare code line and method granularity configurations of CODFREL with different granularity configurations of Regular-LSI, show that our approach outperforms Regular-LSI in precision and recall, with values that are 26 and 8 times better, respectively, even though it does not achieve the optimal solutions. We make an open-source implementation of CODFREL available. Conclusions:Since our approach takes into consideration key issues like the source code size in commercial video games and the requirement dispersion, it provides better starting points than Regular-LSI in the search for solution candidates for the requirements. However, the results and the influence of domain-specific language on them show that more explicit knowledge is required to improve such results.This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R + D + i Plan and ERDF funds under the Project ALPS (RTI2018-096411-B-I00).Blasco, D.; Cetina, C.; Pastor López, O. (2020). A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study. Information and Software Technology. 119:1-12. https://doi.org/10.1016/j.infsof.2019.106235S112119Watkins, R., & Neal, M. (1994). Why and how of requirements tracing. IEEE Software, 11(4), 104-106. doi:10.1109/52.300100Rempel, P., & Mader, P. (2017). Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality. IEEE Transactions on Software Engineering, 43(8), 777-797. doi:10.1109/tse.2016.2622264Borg, M., Runeson, P., & Ardö, A. (2013). Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empirical Software Engineering, 19(6), 1565-1616. doi:10.1007/s10664-013-9255-yLandauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2-3), 259-284. doi:10.1080/01638539809545028Poshyvanyk, D., Gueheneuc, Y.-G., Marcus, A., Antoniol, G., & Rajlich, V. (2007). Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval. IEEE Transactions on Software Engineering, 33(6), 420-432. doi:10.1109/tse.2007.1016Dit, B., Revelle, M., Gethers, M., & Poshyvanyk, D. (2011). Feature location in source code: a taxonomy and survey. Journal of Software: Evolution and Process, 25(1), 53-95. doi:10.1002/smr.567Arcuri, A., & Fraser, G. (2013). Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering, 18(3), 594-623. doi:10.1007/s10664-013-9249-9Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77-89. doi:10.1016/s0034-4257(97)00083-7Apache opennlp: Toolkit for the processing of natural language text, 2017, (https://opennlp.apache.org/). [Online; accessed 12-November-2017].P. Abeles, Efficient java matrix library, 2017, (http://ejml.org/). [Online; accessed 9-November-2017].IGDA, International Game Developers Association, 2018.Lucia, A. D., Fasano, F., Oliveto, R., & Tortora, G. (2007). Recovering traceability links in software artifact management systems using information retrieval methods. ACM Transactions on Software Engineering and Methodology, 16(4), 13. doi:10.1145/1276933.1276934De Lucia, A., Oliveto, R., & Tortora, G. (2008). Assessing IR-based traceability recovery tools through controlled experiments. Empirical Software Engineering, 14(1), 57-92. doi:10.1007/s10664-008-9090-8Zou, X., Settimi, R., & Cleland-Huang, J. (2009). Improving automated requirements trace retrieval: a study of term-based enhancement methods. Empirical Software Engineering, 15(2), 119-146. doi:10.1007/s10664-009-9114-zUnterkalmsteiner, M., Gorschek, T., Feldt, R., & Lavesson, N. (2015). Large-scale information retrieval in software engineering - an experience report from industrial application. Empirical Software Engineering, 21(6), 2324-2365. doi:10.1007/s10664-015-9410-8Bavota, G., De Lucia, A., Oliveto, R., & Tortora, G. (2014). Enhancing software artefact traceability recovery processes with link count information. Information and Software Technology, 56(2), 163-182. doi:10.1016/j.infsof.2013.08.00
    • …
    corecore