10 research outputs found

    Natural Language Requirements Processing: A 4D Vision

    Get PDF
    The future evolution of the application of natural language processing technologies in requirements engineering can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets

    Datasets Used in Fifteen Years of Automated Requirements Traceability Research

    Get PDF
    Datasets are crucial to advance automated software traceability research. Acquiring such datasets come in a high cost and require expert knowledge to manually collect and validate them. Obtaining such software development datasets has been one of the most frequently reported barrier for researchers in the software engineering domain in general. This problem is even more acute in field of requirement traceability, which plays crucial role in safety critical and highly regulated systems. Therefore, the main motivation behind this work is to analyze the current state of art of datasets used in the field of software traceability. This work presents a first-of-its-kind literature study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It articulates several attributes related to these datasets such as their characteristics, threats and diversity. Firstly, 202 primary studies (refer Appendix A) were identified for purpose of this study, which were used to derive 73 unique datasets. These 73 datasets were studied in-depth and several attributes (size, type, domain, availability, artifacts) were extracted (refer Appendix B). Based on analysis of the primary studies, a threat to validity reference model, tailored to Software traceability datasets was derived (refer to figure 4.4). Furthermore, to put some light upon the dataset diversity trend in the Software traceability community, a metric called Dataset Diversity Ratio was derived for 38 authors (refer to figure 4.5) who have published more than one publication in field of software traceability

    Interaction-Based Creation and Maintenance of Continuously Usable Trace Links

    Get PDF
    Traceability is a major concern for all software engineering artefacts. The core of traceability are trace links between the artefacts. Out of the links between all kinds of artefacts, trace links between requirements and source code are fundamental, since they enable the connection between the user point of view of a requirement and its actual implementation. Trace links are important for many software engineering tasks such as maintenance, program comprehension, verification, etc. Furthermore, the direct availability of trace links during a project improves the performance of developers. The manual creation of trace links is too time-consuming to be practical. Thus, traceability research has a strong focus on automatic trace link creation. The most common automatic trace link creation methods use information retrieval techniques to measure the textual similarity between artefacts. The results of the textual similarity measurement is then used to judge the creation of links between artefacts. The application of such information retrieval techniques results in a lot of wrong link candidates and requires further expert knowledge to make the automatically created links usable, insomuch as it is necessary to manually vet the link candidates. This fact prevents the usage of information retrieval techniques to create trace links continuously and directly provide them to developers during a project. Thus, this thesis addresses the problem of continuously providing trace links of a good quality to developers during a project and to maintain these links along with changing artefacts. To achieve this, a novel automatic trace link creation approach called Interaction Log Recording-based Trace Link Creation (ILog) has been designed and evaluated. ILog utilizes the interactions of developers with source code while implementing requirements. In addition, ILog uses the common development convention to provide issues' identifiers in a commit message, to assign recorded interactions to requirements. Thus ILog avoids additional manual efforts from the developers for link creation. ILog has been implemented in a set of tools. The tools enable the recording of interactions in different integrated development environments and the subsequent creation of trace links. Trace link are created between source code files which have been touched by interactions and the current requirement which is being worked on. The trace links which are initially created in this way are further improved by utilizing interaction data such as interaction duration, frequency, type, etc. and source code structure, i.e. source code references between source code files involved in trace links. ILog's link improvement removes potentially wrong links and subsequently adds further correct links. ILog was evaluated in three empirical studies using gold standards created by experts. One of the studies used data from an open source project. In the two other studies, student projects involving a real world customer were used. The results of the studies showed that ILog can create trace links with perfect precision and good recall, which enables the direct usage of the links. The studies also showed that the ILog approach has better precision and recall than other automatic trace link creation approaches, such as information retrieval. To identify trace link maintenance capabilities suitable for the integration in ILog, a systematic literature review about trace link maintenance was performed. In the systematic literature review the trace link maintenance approaches which were found are discussed on the basis of a standardized trace link maintenance process. Furthermore, the extension of ILog with suitable trace link maintenance capabilities from the approaches found is illustrated

    Towards an Intelligent System for Software Traceability Datasets Generation

    Get PDF
    Software datasets and artifacts play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. Software artifacts, other than source code and issue tracking entities, can also provide a great deal of insight into a software system and facilitate knowledge sharing and information reuse. The diversity and quality of the datasets and artifacts within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. In this dissertation, we report our empirical work that aims to automatically generate and assess the quality of such datasets. Our goal is to introduce an intelligent system that can help researchers in the domain of software traceability in obtaining high-quality “training sets”, “testing sets” or appropriate “case studies” from open source repositories based on their needs. In the first project, we present a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Second, this dissertation introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used. Third, we present the results of an empirical study with limited scope to generate datasets using three baseline approaches for the creation of training data. These approaches are (i) Expert-Based, (ii) Automated Web-Mining, which generates training sets by automatically mining tactic\u27s APIs from technical programming websites, and lastly, (iii) Automated Big-Data Analysis, which mines ultra-large-scale code repositories to generate training sets. We compare the trace-link creation accuracy achieved using each of these three baseline approaches and discuss the costs and benefits associated with them. Additionally, in a separate study, we investigate the impact of training set size on the accuracy of recovering trace links. Finally, we conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts. Finally, we conducted a study to understand how software traceability experts and practitioners evaluate the quality of their datasets. In addition, we aim at gathering experts’ opinions on all quality attributes and metrics proposed by T-DQA

    Requirements engineering: foundation for software quality

    Get PDF

    Identification of Software Features in Issue Tracking System Data

    Get PDF
    The knowledge of Software Features (SFs) is vital for software developers and requirements specialists during all software engineering phases: to understand and derive software requirements, to plan and prioritize implementation tasks, to update documentation, or to test whether the final product correctly implements the requested SF. In most software projects, SFs are managed in conjunction with other information such as bug reports, programming tasks, or refactoring tasks with the aid of Issue Tracking Systems (ITSs). Hence ITSs contains a variety of information that is only partly related to SFs. In practice, however, the usage of ITSs to store SFs comes with two major problems: (1) ITSs are neither designed nor used as documentation systems. Therefore, the data inside an ITS is often uncategorized and SF descriptions are concealed in rather lengthy. (2) Although an SF is often requested in a single sentence, related information can be scattered among many issues. E.g. implementation tasks related to an SF are often reported in additional issues. Hence, the detection of SFs in ITSs is complicated: a manual search for the SFs implies reading, understanding and exploiting the Natural Language (NL) in many issues in detail. This is cumbersome and labor intensive, especially if related information is spread over more than one issue. This thesis investigates whether SF detection can be supported automatically. First the problem is analyzed: (i) An empirical study shows that requests for important SFs reside in ITSs, making ITSs a good tar- get for SF detection. (ii) A second study identifies characteristics of the information and related NL in issues. These characteristics repre- sent opportunities as well as challenges for the automatic detection of SFs. Based on these problem studies, the Issue Tracking Software Feature Detection Method (ITSoFD), is proposed. The method has two main components and includes an approach to preprocess issues. Both components address one of the problems associated with storing SFs in ITSs. ITSoFD is validated in three solution studies: (I) An empirical study researches how NL that describes SFs can be detected with techniques from Natural Language Processing (NLP) and Machine Learning. Issues are parsed and different characteristics of the issue and its NL are extracted. These characteristics are used to clas- sify the issue’s content and identify SF description candidates, thereby approaching problem (1). (II) An empirical study researches how issues that carry information potentially related to an SF can be detected with techniques from NLP and Information Retrieval. Characteristics of the issue’s NL are utilized to create a traceability network vii of related issues, thereby approaching problem (2). (III) An empirical study researches how NL data in issues can be preprocessed using heuristics and hierarchical clustering. Code, stack traces, and other technical information is separated from NL. Heuristics are used to identify candidates for technical information and clustering improves the heuristic’s results. The technique can be applied to support components, I. and II

    FIRMa: uma proposta baseada nos instrumentos utilizados pela gestão da informação para auxiliar o processo de gestão de requisitos de software

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Ciências da Educação, Programa de Pós-Graduação em Ciência da Informação, Florianópolis, 2021.Algumas das atividades fundamentais do processo de desenvolvimento de software estão relacionadas à disciplina de Engenharia de Requisitos, uma das áreas da Ciência da Computação cujos objetivos são descobrir, analisar, documentar, verificar e gerir os requisitos que farão parte do software. Os requisitos são as características do sistema, sendo identificados com base em informações fornecidas pelos usuários ou por especialistas no negócio; e a efetiva gestão dessas informações é essencial para garantir que o sistema atenda às necessidades de quem irá utilizá-lo. De acordo com pesquisas, um dos problemas que impactam negativamente o processo de desenvolvimento de software está relacionado com a condução da atividade de gestão dos requisitos do projeto. E, com o intuito de amenizar esse problema, esta tese propõe o FIRMa ? Framework based on Information Management to Requirements Management. Esse framework tem como objetivo auxiliar as atividades do processo de gestão de requisitos a partir da aplicação de instrumentos utilizados pela gestão da informação. Para a elaboração do FIRMa, oito instrumentos foram experimentados em um processo de gestão de requisitos e avaliados em relação a sua complexidade, satisfação, recursos envolvidos e adaptabilidade. Após a sua aplicação e avaliação, os instrumentos considerados passíveis de ser aplicados no contexto da gestão de requisitos e selecionados para compor o FIRMa foram a identificação e classificação das fontes de informação, os glossários, os cabeçalhos de assunto, as taxonomias, os tesauros, as redes semânticas e as ontologias. A presente pesquisa caracteriza-se quanto à natureza como uma pesquisa aplicada, objetivando a solução de problemas do mundo real; quanto aos objetivos como exploratória e descritiva; e quanto à abordagem utiliza-se de métodos mistos, uma vez que associa formas de pesquisa qualitativa e quantitativa em um mesmo estudo. E, quanto aos procedimentos metodológicos, foi escolhida a pesquisa bibliográfica para a compreensão dos conceitos relacionados à gestão da informação e à gestão de requisitos, tendo sido utilizado para a construção do framework o método Design Science Research. Após definidas as diretrizes para o seu uso, o framework foi avaliado por um grupo de 18 especialistas da área da Engenharia de Requisitos, que deram a sua opinião sobre cada um dos instrumentos empregados pela gestão da informação selecionados para fazerem parte da sua estrutura, bem como sobre as etapas definidas e as diretrizes elaboradas para a sua utilização. Os resultados da avaliação demonstraram que o FIRMa pode vir a contribuir com as atividades do processo de gestão de requisitos e consequentemente aumentar as chances de sucesso do projeto.Abstract: Some of the software development process's fundamental activities are related to the discipline of Requirements Engineering, one of Computer Science areas. Their objectives are to discover, analyze, document, verify and manage the requirements that will be part of the software. The requirements are the system's characteristics and are identified based on information provided by users or experts in the business, and the effective management of this information is essential to ensure that the system meets the needs of those who will use it. According to research, one of the problems that negatively impact the software development process is related to the conduct of the project requirements management activity, and to alleviate this problem, this thesis proposes the FIRMa - Framework based on Information Management to Requirements Management. This framework aims to assist the requirements management process activities from applying tools used for information management. For elaborating the FIRMa, eight tools were tested in a requirements management process and evaluated their complexity, satisfaction, resources involved, and adaptability. After its application and evaluation, the tools considered likely to be applied of requirements management and selected to compose the FIRMa were: the identification and classification of information sources, glossaries, subject headings, taxonomies, thesauri, semantic networks, and ontologies. The present research is characterized as to its nature as applied research, aiming at the solution of real-world problems; as to the objectives as exploratory and descriptive; as for the approach, mixed methods are used, since it combines forms of qualitative and quantitative research in the same study. Was used bibliographic research as for the methodological procedures to understand the concepts related to information management and requirements management, and for the construction of the framework, was use the Design Science Research method. After defining the guidelines for its use, the framework was evaluated by a group of 18 specialists in the area of Requirements Engineering, who gave their opinion about each of the tools used by the information management selected to be part of their structure, as well as on the defined steps and guidelines developed for its use. The evaluation results demonstrated that the FIRMa can contribute to the activities of the requirements management process and consequently increase the chances of success of the project

    Supporting traceability through affinity mining

    Full text link
    © 2014 IEEE. Traceability among requirements artifacts (and beyond, in certain cases all the way to actual implementation) has long been identified as a critical challenge in industrial practice. Manually establishing and maintaining such traces is a highskill, labour-intensive job. It is often the case that the ideal person for the job also has other, highly critical tasks to take care of, so offering semi-automated support for the management of traces is an effective way of improving the efficiency of the whole development process. In this paper, we present a technique to exploit the information contained in previously defined traces, in order to facilitate the creation and ongoing maintenance of traces, as the requirements evolve. A case study on a reference dataset is employed to measure the effectiveness of the technique, compared to other proposals from the literature

    Supporting traceability through affinity mining

    No full text
    Traceability among requirements artifacts (and beyond, in certain cases all the way to actual implementation) has long been identified as a critical challenge in industrial practice. Manually establishing and maintaining such traces is a high-skill, labour-intensive job. It is often the case that the ideal person for the job also has other, highly critical tasks to take care of, so offering semi-automated support for the management of traces is an effective way of improving the efficiency of the whole development process. In this paper, we present a technique to exploit the information contained in previously defined traces, in order to facilitate the creation and ongoing maintenance of traces, as the requirements evolve. A case study on a reference dataset is employed to measure the effectiveness of the technique, compared to other proposals from the literature
    corecore