6 research outputs found
Empirical evaluation of bug linking
International audienceTo collect software bugs found by users, development teams often setup bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. Unfortunately, the links between bug reports and code changes are missing for many software projects as the bug tracking and version control systems are often maintained separately. Yet, linking bug reports to fix commits is important as it could shed light into the nature of bug fixing processes and expose patterns in software management. Bug linking solutions, such as ReLink, have been proposed. The demonstration of their effectiveness however faces a number of issues, including a reliability issue with their ground truth datasets as well as the extent of their measurements. We propose in this study a benchmark for evaluating bug linking solutions. This benchmark includes a dataset of about 12,000 bug links from 10 programs. These true links between bug reports and their fixes have been provided during bug fixing processes. We designed a number of research questions, to assess both quantitatively and qualitatively the effectiveness of a bug linking tool. Finally, we apply this benchmark on ReLink to report the strengths and limitations of this bug linking tool
Doctor of Philosophy
dissertationExchanging patient specific information across heterogeneous information systems is a critical but increasingly complex and expensive challenge. Lacking a universal unique identifier for healthcare, patient records must be linked using combinations of identity attributes such as name, date of birth, and sex. A state's birth certificate registry contains demographic information that is potentially very valuable for identity resolution, but its use for that purpose presents numerous problems. The objectives of this research were to: (1) assess the frequency, extent, reasons, and types of changes on birth certificates; (2) develop and evaluate an ontology describing information used in identity resolution; and (3) use a logical framework to model identity transactions and assess the impact of policy decisions in a cross jurisdictional master person index. To understand birth certificate changes, we obtained de identifified datasets from the Utah birth certifificate registry, including history and reasons for changes from 2000 to 2012. We conducted cohort analyses, examining the number, reason, and extent of changes over time, and cross sectional analyses to assess patterns of changes. We evaluated an ontological approach to overcome heterogeneity between systems exchanging identity information and demonstrated the use of two existing ontologies, the Simple Event Model (SEM) and the Clinical Element Model (CEM), to capture an individual's identity history. We used Discrete Event Calculus to model identity events iv across domains and over time. Models were used to develop contextual rules for releasing minimal information from birth certificate registries for sensitive cases such as adoptions. Our findings demonstrate that the mutability of birth certificates makes them a valuable resource for identity resolution, provided that changes can be captured and modeled in a usable form. An ontology can effectively model identity attributes and the events that cause them to change over time, as well as to overcome syntactic and semantic heterogeneity. Finally, we show that dynamic, contextual rules can be used to govern the flow of identity information between systems, allowing entities to link records in the most difficult cases, avoid costly human review, and avoid the threats to privacy that come from such review
Recommended from our members
Improving the quality of bug data in software repositories
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.Context : Researchers have increasingly recognised the benefit of mining software repositories
to extract information. Thus, integrating a version control tool (VC tool) and bug tracking
tool (BT tool) in mining software repositories as well as synchronising missing bug tracking
data (BT data) and version control log (VC log) becomes of paramount importance, in order
to improve the quality of bug data in software repositories. In this way, researchers can do
good quality research for software project benefit especially in open source software projects
where information is limited in distributed development. Thus, shared data to track the issues
of the project are not common. BT data often appears not to be mirrored when considering
what developers logged as their actions, resulting in reduced traceability of defects in the
development logs (VC logs). VC system (Version control system) data can be enhanced with data from bug tracking system (BT system), because VC logs reports about past software development activities.
When these VC logs and BT data are used together, researchers can have a more complete
picture of a bug’s life cycle, evolution and maintenance. However, current BT system and
VC systems provide insufficient support for cross-analysis of both V Clogs and BT data for
researchers in empirical software engineering research: prediction of software faults, software
reliability, traceability, software quality, effort and cost estimation, bug prediction, and bug
fixing.
Aims and objectives: The aim of the thesis is to design and implement a tool chain to
support the integration of a VC tool and a BT tool, as well as to synchronise the missing VC
logs and BT data of open-source software projects automatically. The syncing process, using
Bicho (BT tool) and CVSAnalY (VC tool), will be demonstrated and evaluated on a sample
of 344 open source software (OSS) projects.
Method: The tool chain was implemented and its performance evaluated semi-automatically.
The SZZ algorithm approach was used to detect and trace BT data and VC logs. In its formulation, the algorithm looks for the terms "Bugs," or "Fixed" (case-insensitive) along with the ’#’ sign, that shows the ID of a bug in the VC system and BT system respectively. In i addition, the SZZ algorithm was dissected in its formulation and precision and recall analysed for the use of “fix”, “bug” or “# + digit” (e.g., #1234), was detected was detected when tracking possible bug IDs from the VC logs of the sample OSS projects.
Results: The results of this analysis indicate that use of “# + digit” (e.g., #1234) is more
precise for bug traceability than the use of the “bug” and “fix” keywords. Such keywords are
indeed present in the VC logs, but they are less useful when trying to connect the development
actions with the bug traces – that is, their recall is high. Overall, the results indicate that
VC log and BT data retrieved and stored by automatic tools can be tracked and recovered
with better accuracy using only a part of the SZZ algorithm. In addition, the results indicate
80-95% of all the missing BT data and VC logs for the 344 OSS projects has been synchronised
into Bicho and CVSAnalY database respectively.
Conclusion: The presented tool chain will eliminate and avoid repetitive activities in
traceability tasks, as well as software maintenance and evolution. This thesis provides a
solution towards the automation and traceability of BT data of software projects (in particular,
OSS projects) using VC logs to complement and track missing bug data.
Synchronising involves completing the missing data of bug repositories with the logs de
tailing the actions of developers. Synchronising benefit various branches of empirical software
engineering research: prediction of software faults, software reliability, traceability, software
quality, effort and cost estimation, bug prediction ,and bug fixing
Tracing Requirements and Source Code During Software Development
Traceability supports the software development process in various ways, amongst others, change management, software maintenance and prevention of misunderstandings. Traceability links between requirements and code are vital to support these development activities, e.g., navigating from a requirement to its realization in the code, and vice versa. However, in practice, traceability links between requirements and code are often not created during development because this would require increased development effort. This reduces the possibilities for developers to use these links during development.
To address this weakness, this thesis presents an approach that (semi-) automatically captures traceability links between requirements and code during development. We do this by using work items from project management that are typically stored in issue trackers. The presented approach consists of three parts. The first part comprises a TIM consisting of artifacts from three different areas, namely requirements engineering, project management, and code. The TIM also includes the traceability links between them. The second part presents three processes for capturing traceability links between requirements, work items, and code during development. The third part defines an algorithm that automatically infers traceability links between requirements and code based on the interlinked work items. The traceability approach is implemented as an extension to the model-based CASE tool UNICASE, which is called UNICASE Trace Client.
Practitioners and researchers have discussed the practice of using work items to capture links between requirements and code, but there has been no systematic study of this practice. This thesis provides a first empirical study based on the application of the presented approach. The approach and its tool support are applied in three different software development projects conducted with undergraduate students. The feasibility and practicability of the presented approach and its tool support are evaluated. The feasibility results indicate that the approach creates correct traceability links between all artifacts with high precision and recall during development. At the same time the practicability results indicate that the subjects found the approach and its tool support easy to use. In a second empirical study we compare the presented approach with an existing technique for the automatic creation of traceability links between requirements and code. The results indicate the presented approach outperforms the existing technique in terms of the quality of the created traceability links