16,980 research outputs found

    Empirical Study of Training-Set Creation for Software Architecture Traceability Methods

    Get PDF
    Machine-learning algorithms have the potential to support trace retrieval methods making significant reductions in costs and human-involvement required for the creation and maintenance of traceability links between system requirements, system architecture, and the source code. These algorithms can be trained how to detect the relevant architecture and can then be sent to find it on its own. However, the long-term reductions in cost and effort face a significant upfront cost in the initial training of the algorithm. This cost comes in the form of needing to create training sets of code, which train the algorithm how to identify traceability links. These supervised or semi-supervised training methods require the involvement of highly trained, and thus expensive, experts to collect, and format, these data-sets. In this thesis, three baseline methods training datasets creation are presented. These methods are (i) Manual Expert-based, which involves a human-compiled dataset, (ii) Automated Web-Mining, which creates training datasets by collecting and data-mining APIs (specifically from technical-programming websites), and (iii) Automated Big-Data Analysis, which data-mines ultra-large code repositories to generate the training datasets. The trace-link creation accuracy achieved using each of these three methods is compared, and the cost/benefit comparisons between them is discussed. Furthermore, in a related area, potential correlations between training set size and the accuracy of recovering trace links is investigated. The results of this area of study indicate that the automated techniques, capable of creating very large training sets, allow for sufficient reliability in the problem of tracing architectural tactics. This indicates that these automated methods have potential applications in other areas of software traceability

    Towards an Intelligent System for Software Traceability Datasets Generation

    Get PDF
    Software datasets and artifacts play a crucial role in advancing automated software traceability research. They can be used by researchers in different ways to develop or validate new automated approaches. Software artifacts, other than source code and issue tracking entities, can also provide a great deal of insight into a software system and facilitate knowledge sharing and information reuse. The diversity and quality of the datasets and artifacts within a research community have a significant impact on the accuracy, generalizability, and reproducibility of the results and consequently on the usefulness and practicality of the techniques under study. Collecting and assessing the quality of such datasets are not trivial tasks and have been reported as an obstacle by many researchers in the domain of software engineering. In this dissertation, we report our empirical work that aims to automatically generate and assess the quality of such datasets. Our goal is to introduce an intelligent system that can help researchers in the domain of software traceability in obtaining high-quality “training sets”, “testing sets” or appropriate “case studies” from open source repositories based on their needs. In the first project, we present a first-of-its-kind study to review and assess the datasets that have been used in software traceability research over the last fifteen years. It presents and articulates the current status of these datasets, their characteristics, and their threats to validity. Second, this dissertation introduces a Traceability-Dataset Quality Assessment (T-DQA) framework to categorize software traceability datasets and assist researchers to select appropriate datasets for their research based on different characteristics of the datasets and the context in which those datasets will be used. Third, we present the results of an empirical study with limited scope to generate datasets using three baseline approaches for the creation of training data. These approaches are (i) Expert-Based, (ii) Automated Web-Mining, which generates training sets by automatically mining tactic\u27s APIs from technical programming websites, and lastly, (iii) Automated Big-Data Analysis, which mines ultra-large-scale code repositories to generate training sets. We compare the trace-link creation accuracy achieved using each of these three baseline approaches and discuss the costs and benefits associated with them. Additionally, in a separate study, we investigate the impact of training set size on the accuracy of recovering trace links. Finally, we conduct a large-scale study to identify which types of software artifacts are produced by a wide variety of open-source projects at different levels of granularity. Then we propose an automated approach based on Machine Learning techniques to identify various types of software artifacts. Through a set of experiments, we report and compare the performance of these algorithms when applied to software artifacts. Finally, we conducted a study to understand how software traceability experts and practitioners evaluate the quality of their datasets. In addition, we aim at gathering experts’ opinions on all quality attributes and metrics proposed by T-DQA

    Mining requirements links

    Full text link
    [Context & motivation] Obtaining traceability among requirements and between requirements and other artifacts is an extremely important activity in practice, an interesting area for theoretical study, and a major hurdle in common industrial experience. Substantial effort is spent on establishing and updating such links in any large project - even more so when requirements refer to a product family. [Question/problem] While most research is concerned with ways to reduce the effort needed to establish and maintain traceability links, a different question can also be asked: how is it possible to harness the vast amount of implicit (and tacit) knowledge embedded in already-established links? Is there something to be learned about a specific problem or domain, or about the humans who establish traces, by studying such traces? [Principal ideas/results] In this paper, we present preliminary results from a study applying different machine learning techniques to an industrial case study, and test to what degree common hypothesis hold in our case. [Contribution] Reshaping traceability data into knowledge can contribute to more effective automatic tools to suggest candidates for linking, to inform improvements in writing style, and at the same time provide some insight into both the domain of interest and the actual implementation techniques. © 2011 Springer-Verlag Berlin Heidelberg

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    An Approach for Discovering Traceability Links between Regulatory Documents and Source Code Through User-Interface Labels

    Get PDF
    In application domains that are regulated, software vendors must maintain traceability links between the regulatory items and the code base implementing them. In this paper, we present a traceability approach based on the intuition that the regulatory documents and the user-interface of the corresponding software applications are very close. First, they use the same terminology. Second, most important regulatory pieces of information appear in the graphical user-interface because the end-users in those application domains care about the regulation (by construction). We evaluate our approach in the domain of green building. The evaluation involves a domain expert, lead architect of a commercial product within this area. The evaluation shows that the recovered traceability links are accurate

    Grand Challenges of Traceability: The Next Ten Years

    Full text link
    In 2007, the software and systems traceability community met at the first Natural Bridge symposium on the Grand Challenges of Traceability to establish and address research goals for achieving effective, trustworthy, and ubiquitous traceability. Ten years later, in 2017, the community came together to evaluate a decade of progress towards achieving these goals. These proceedings document some of that progress. They include a series of short position papers, representing current work in the community organized across four process axes of traceability practice. The sessions covered topics from Trace Strategizing, Trace Link Creation and Evolution, Trace Link Usage, real-world applications of Traceability, and Traceability Datasets and benchmarks. Two breakout groups focused on the importance of creating and sharing traceability datasets within the research community, and discussed challenges related to the adoption of tracing techniques in industrial practice. Members of the research community are engaged in many active, ongoing, and impactful research projects. Our hope is that ten years from now we will be able to look back at a productive decade of research and claim that we have achieved the overarching Grand Challenge of Traceability, which seeks for traceability to be always present, built into the engineering process, and for it to have "effectively disappeared without a trace". We hope that others will see the potential that traceability has for empowering software and systems engineers to develop higher-quality products at increasing levels of complexity and scale, and that they will join the active community of Software and Systems traceability researchers as we move forward into the next decade of research

    Can Clustering Improve Requirements Traceability? A Tracelab-enabled Study

    Get PDF
    Software permeates every aspect of our modern lives. In many applications, such in the software for airplane flight controls, or nuclear power control systems software failures can have catastrophic consequences. As we place so much trust in software, how can we know if it is trustworthy? Through software assurance, we can attempt to quantify just that. Building complex, high assurance software is no simple task. The difficult information landscape of a software engineering project can make verification and validation, the process by which the assurance of a software is assessed, very difficult. In order to manage the inevitable information overload of complex software projects, we need software traceability, the ability to describe and follow the life of a requirement, in both forwards and backwards direction. The Center of Excellence for Software Traceability (CoEST) has created a compelling research agenda with the goal of ubiquitous traceability by 2035. As part of this goal, they have developed TraceLab, a visual experimental workbench built to support design, implementation, and execution of traceability experiments. Through our collaboration with CoEST, we have made several contributions to TraceLab and its community. This work contributes to the goals of the traceability research community. The three key contributions are (a) a machine learning component package for TraceLab featuring six (6) classifier algorithms, five (5) clustering algorithms, and a total of over 40 components for creating TraceLab experiments, built upon the WEKA machine learning package, as well as implementing methods outside of WEKA; (b) the design for an automated tracing system that uses clustering to decompose the task of tracing into many smaller tracing subproblems; and (c) an implementation of several key components of this tracing system using TraceLab and its experimental evaluation

    Natural Language Requirements Processing: A 4D Vision

    Get PDF
    The future evolution of the application of natural language processing technologies in requirements engineering can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets

    Regulatory Compliance-oriented Impediments and Associated Effort Estimation Metrics in Requirements Engineering for Contractual Systems Engineering Projects

    Get PDF
    Large-scale contractual systems engineering projects often need to comply with a myriad of government regulations and standards as part of contractual fulfillment. A key activity in the requirements engineering (RE) process for such a project is to elicit appropriate requirements from the regulations and standards that apply to the target system. However, there are impediments in achieving compliance due to such factors as: the voluminous contract and its high-level specifications, large number of regulatory documents, and multiple domains of the system. Little empirical research has been conducted on developing a shared understanding of the compliance-oriented complexities involved in such projects, and identifying and developing RE support (such as processes, tools, metrics, and methods) to improve overall performance for compliance projects. Through three studies on an industrial RE project, we investigated a number of issues in RE concerning compliance, leading to the following novel results:(i) a meta-model that captures artefacts-types and their compliance-oriented inter-relationships that exist in RE for contractual systems engineering projects; (ii) discovery of key impediments to requirements-compliance due to: (a) contractual complexities (e.g., regulatory requirements specified non-contiguously with non-regulatory requirements in the contract at the ratio of 1:19), (b) complexities in regulatory documents (e.g., over 300 regulatory documents being relevant to the subject system), and (c) large and complex system (e.g., 40% of the contractual regulatory requirements are cross-cutting); (iii) a method for deriving base metrics for estimating the effort needed to do compliance work during RE and demonstrate how a set of derived metrics can be used to create an effort estimation model for such work; (iv) a framework for structuring diverse regulatory documents and requirements for global product developments. These results lay a foundation in RE research on compliance issues with anticipation for its impact in real-world projects and in RE research
    corecore