6 research outputs found
Do Bugs Propagate? An Empirical Analysis of Temporal Correlations Among Software Bugs
The occurrences of bugs are not isolated events, rather they may interact, affect each other, and trigger other latent bugs. Identifying and understanding bug correlations could help developers localize bug origins, predict potential bugs, and design better architectures of software artifacts to prevent bug affection. Many studies in the defect prediction and fault localization literature implied the dependence and interactions between multiple bugs, but few of them explicitly investigate the correlations of bugs across time steps and how bugs affect each other. In this paper, we perform social network analysis on the temporal correlations between bugs across time steps on software artifact ties, i.e., software graphs. Adopted from the correlation analysis methodology in social networks, we construct software graphs of three artifact ties such as function calls and type hierarchy and then perform longitudinal logistic regressions of time-lag bug correlations on these graphs. Our experiments on four open-source projects suggest that bugs can propagate as observed on certain artifact tie graphs. Based on our findings, we propose a hybrid artifact tie graph, a synthesis of a few well-known software graphs, that exhibits a higher degree of bug propagation. Our findings shed light on research for better bug prediction and localization models and help developers to perform maintenance actions to prevent consequential bugs
Studying the lives of software bugs
For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs.For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs
Recommended from our members
Improving Information Retrieval Bug Localisation Using Contextual Heuristics
Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code?
Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task.
My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports.
I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the effectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report.
Measuring my approach against six others, using their five metrics and eight projects, I position an effected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool.
Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and offers a simpler light-weight solution
Combining Fault Localization with Information Retrieval: an Analysis of Accuracy and Performance for Bug Finding
Debugging is a key activity in the software development process. It has been used extensively by developers to attempt to localize faults, while enhancing the quality and performance of software in general. There has been a significant amount of study in developing and enhancing fault localization techniques, which are used in assisting developers to locate faults within a body of code. However, identifying fault locations using individual techniques is not always effective; combining different techniques, which represent distinct forms of analysis, might help to overcome this issue. There has been a very limited amount of research that suggests that combining more than one approach to fault localization may have benefits, principally because information from different sources is included in the localization process. In this thesis, I attempt to more precisely address the question of whether combining different fault localization techniques can more effectively and efficiently find faults in code, when contrasted with a single technique. To answer this, I have carried out experiments that combine the use of three fault localization techniques: Information Retrieval (IR), Spectrum Based Fault Localization (SBFL), and Text Based Search. These techniques are representative of both dynamic and static fault localization. My hypothesis is that a combination of dynamic and static fault localization analysis can assist developers in better fault localization. I have evaluated the various combinations of techniques in identifying faults against real-world programs, Defects4j, where 395 faults and bug reports have been analyzed. The experimental results demonstrate that the combination of three techniques (SBFL, Text Search, and IR) is the most accurate, with 86.84% accuracy for 343 faults located from a total of 395. This finding contributes positively towards concretely recommending techniques for assisting developers in locating faults in code. Guidelines are provided on which combination of techniques, with maximal accuracy of result, should be applied especially when there is no prior knowledge about the fault
Recommended from our members
Improving the quality of bug data in software repositories
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.Context : Researchers have increasingly recognised the benefit of mining software repositories
to extract information. Thus, integrating a version control tool (VC tool) and bug tracking
tool (BT tool) in mining software repositories as well as synchronising missing bug tracking
data (BT data) and version control log (VC log) becomes of paramount importance, in order
to improve the quality of bug data in software repositories. In this way, researchers can do
good quality research for software project benefit especially in open source software projects
where information is limited in distributed development. Thus, shared data to track the issues
of the project are not common. BT data often appears not to be mirrored when considering
what developers logged as their actions, resulting in reduced traceability of defects in the
development logs (VC logs). VC system (Version control system) data can be enhanced with data from bug tracking system (BT system), because VC logs reports about past software development activities.
When these VC logs and BT data are used together, researchers can have a more complete
picture of a bug’s life cycle, evolution and maintenance. However, current BT system and
VC systems provide insufficient support for cross-analysis of both V Clogs and BT data for
researchers in empirical software engineering research: prediction of software faults, software
reliability, traceability, software quality, effort and cost estimation, bug prediction, and bug
fixing.
Aims and objectives: The aim of the thesis is to design and implement a tool chain to
support the integration of a VC tool and a BT tool, as well as to synchronise the missing VC
logs and BT data of open-source software projects automatically. The syncing process, using
Bicho (BT tool) and CVSAnalY (VC tool), will be demonstrated and evaluated on a sample
of 344 open source software (OSS) projects.
Method: The tool chain was implemented and its performance evaluated semi-automatically.
The SZZ algorithm approach was used to detect and trace BT data and VC logs. In its formulation, the algorithm looks for the terms "Bugs," or "Fixed" (case-insensitive) along with the ’#’ sign, that shows the ID of a bug in the VC system and BT system respectively. In i addition, the SZZ algorithm was dissected in its formulation and precision and recall analysed for the use of “fix”, “bug” or “# + digit” (e.g., #1234), was detected was detected when tracking possible bug IDs from the VC logs of the sample OSS projects.
Results: The results of this analysis indicate that use of “# + digit” (e.g., #1234) is more
precise for bug traceability than the use of the “bug” and “fix” keywords. Such keywords are
indeed present in the VC logs, but they are less useful when trying to connect the development
actions with the bug traces – that is, their recall is high. Overall, the results indicate that
VC log and BT data retrieved and stored by automatic tools can be tracked and recovered
with better accuracy using only a part of the SZZ algorithm. In addition, the results indicate
80-95% of all the missing BT data and VC logs for the 344 OSS projects has been synchronised
into Bicho and CVSAnalY database respectively.
Conclusion: The presented tool chain will eliminate and avoid repetitive activities in
traceability tasks, as well as software maintenance and evolution. This thesis provides a
solution towards the automation and traceability of BT data of software projects (in particular,
OSS projects) using VC logs to complement and track missing bug data.
Synchronising involves completing the missing data of bug repositories with the logs de
tailing the actions of developers. Synchronising benefit various branches of empirical software
engineering research: prediction of software faults, software reliability, traceability, software
quality, effort and cost estimation, bug prediction ,and bug fixing
Bug localisation through diverse sources of information
Many approaches have been proposed to address the problem of bug localisation – taking a bug report and recommending to developers the possible locations of the bug in the project. However, these can often require significant up-front work from developers, and are not widely adopted. Furthermore, those techniques which do not require this up-front investment are often far from accurate, and do not take advantage of all of the information that they could. We propose a technique for combining information from multiple, novel sources of information about a project and a bug, and use this to recommend bug locations to developers. We also identify how this technique could be used to create a low-effort tool for bug localisation, with the aim of increasing developer adoption. We evaluate the technique on 1143 bugs in three open-source projects, and find that it can be used to increase the number of bugs where the first relevant method recommended to developers is the top result from 98 to 132 and in the top-10 from 271 to 322