72 research outputs found

    Recovering from a Decade: A Systematic Mapping of Information Retrieval Approaches to Software Traceability

    Get PDF
    Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery

    From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution

    Get PDF
    Software developers in large projects work in complex information landscapes and staying on top of all relevant software artifacts is an acknowledged challenge. As software systems often evolve over many years, a large number of issue reports is typically managed during the lifetime of a system, representing the units of work needed for its improvement, e.g., defects to fix, requested features, or missing documentation. Efficient management of incoming issue reports requires the successful navigation of the information landscape of a project. In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development. Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution. We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Semantic recovery of traceability links between system artifacts

    Get PDF
    This paper introduces a mechanism to recover traceability links between the requirements and logical models in the context of critical systems development. Currently, lifecycle processes are covered by a good number of tools that are used to generate different types of artifacts. One of the cornerstone capabilities in the development of critical systems lies in the possibility of automatically recovery traceability links between system artifacts generated in different lifecycle stages. To do so, it is necessary to establish to what extent two or more of these work products are similar, dependent or should be explicitly linked together. However, the different types of artifacts and their internal representation depict a major challenge to unify how system artifacts are represented and, then, linked together. That is why, in this work, a concept-based representation is introduced to provide a semantic and unified description of any system artifact. Furthermore, a traceability function is defined and implemented to exploit this new semantic representation and to support the recovery of traceability links between different types of system artifacts. In order to evaluate the traceability function, a case study in the railway domain is conducted to compare the precision and recall of recovery traceability links between text-based requirements and logical model elements. As the main outcome of this work, the use of a concept-based paradigm to represent that system artifacts are demonstrated as a building block to automatically recover traceability links within the development lifecycle of critical systems.The research leading to these results has received funding from the H2020 ECSEL Joint Undertaking (JU) under Grant Agreement No. 826452 \Arrowhead Tools for Engineering of Digitalisation Solutions" and from speci¯c national programs and/or funding authorities

    Search-Based Software Maintenance and Testing

    Get PDF
    2012 - 2013In software engineering there are many expensive tasks that are performed during development and maintenance activities. Therefore, there has been a lot of e ort to try to automate these tasks in order to signi cantly reduce the development and maintenance cost of software, since the automation would require less human resources. One of the most used way to make such an automation is the Search-Based Software Engineering (SBSE), which reformulates traditional software engineering tasks as search problems. In SBSE the set of all candidate solutions to the problem de nes the search space while a tness function di erentiates between candidate solutions providing a guidance to the optimization process. After the reformulation of software engineering tasks as optimization problems, search algorithms are used to solve them. Several search algorithms have been used in literature, such as genetic algorithms, genetic programming, simulated annealing, hill climbing (gradient descent), greedy algorithms, particle swarm and ant colony. This thesis investigates and proposes the usage of search based approaches to reduce the e ort of software maintenance and software testing with particular attention to four main activities: (i) program comprehension; (ii) defect prediction; (iii) test data generation and (iv) test suite optimiza- tion for regression testing. For program comprehension and defect prediction, this thesis provided their rst formulations as optimization problems and then proposed the usage of genetic algorithms to solve them. More precisely, this thesis investigates the peculiarity of source code against textual documents written in natural language and proposes the usage of Genetic Algorithms (GAs) in order to calibrate and assemble IR-techniques for di erent software engineering tasks. This thesis also investigates and proposes the usage of Multi-Objective Genetic Algorithms (MOGAs) in or- der to build multi-objective defect prediction models that allows to identify defect-prone software components by taking into account multiple and practical software engineering criteria. Test data generation and test suite optimization have been extensively investigated as search- based problems in literature . However, despite the huge body of works on search algorithms applied to software testing, both (i) automatic test data generation and (ii) test suite optimization present several limitations and not always produce satisfying results. The success of evolutionary software testing techniques in general, and GAs in particular, depends on several factors. One of these factors is the level of diversity among the individuals in the population, which directly a ects the exploration ability of the search. For example, evolutionary test case generation techniques that employ GAs could be severely a ected by genetic drift, i.e., a loss of diversity between solutions, which lead to a premature convergence of GAs towards some local optima. For these reasons, this thesis investigate the role played by diversity preserving mechanisms on the performance of GAs and proposed a novel diversity mechanism based on Singular Value Decomposition and linear algebra. Then, this mechanism has been integrated within the standard GAs and evaluated for evolutionary test data generation. It has been also integrated within MOGAs and empirically evaluated for regression testing. [edited by author]XII n.s
    corecore