357 research outputs found
A preliminary evaluation of text-based and dependency-based techniques for determining the origins of bugs
A crucial step in understanding the life cycle of software bugs is identifying their origin. Unfortunately this information is not usually recorded and recovering it at a later date is challenging. Recently two approaches have been developed that attempt to solve this problem: the text approach and the dependency approach. However only limited evaluation has been carried out on their effectiveness so far, partially due to the lack of data sets linking bugs to their introduction. Producing such data sets is both time-consuming and challenging due to the subjective nature of the problem. To improve this, the origins of 166 bugs in two open-source projects were manually identified. These were then compared to a simulation of the approaches. The results show that both approaches were partially successful across a variety of different types of bugs. They achieved a precision of 29%{79% and a recall of 40%{70%, and could perform better when combined. However there remain a number of challenges to overcome in future development|large commits, unrelated changes and large numbers of versions between the origin and the x all reduce their effectiveness
Applying Bayesian Analysis Guidelines to Empirical Software Engineering Data: The Case of Programming Languages and Code Quality
Statistical analysis is the tool of choice to turn data into information, and
then information into empirical knowledge. To be valid, the process that goes
from data to knowledge should be supported by detailed, rigorous guidelines,
which help ferret out issues with the data or model, and lead to qualified
results that strike a reasonable balance between generality and practical
relevance. Such guidelines are being developed by statisticians to support the
latest techniques for Bayesian data analysis. In this article, we frame these
guidelines in a way that is apt to empirical research in software engineering.
To demonstrate the guidelines in practice, we apply them to reanalyze a
GitHub dataset about code quality in different programming languages. The
dataset's original analysis (Ray et al., 2014) and a critical reanalysis
(Berger at al., 2019) have attracted considerable attention -- in no small part
because they target a topic (the impact of different programming languages) on
which strong opinions abound. The goals of our reanalysis are largely
orthogonal to this previous work, as we are concerned with demonstrating, on
data in an interesting domain, how to build a principled Bayesian data analysis
and to showcase some of its benefits. In the process, we will also shed light
on some critical aspects of the analyzed data and of the relationship between
programming languages and code quality.
The high-level conclusions of our exercise will be that Bayesian statistical
techniques can be applied to analyze software engineering data in a way that is
principled, flexible, and leads to convincing results that inform the state of
the art while highlighting the boundaries of its validity. The guidelines can
support building solid statistical analyses and connecting their results, and
hence help buttress continued progress in empirical software engineering
research
A new perspective on the competent programmer hypothesis through the reproduction of bugs with repeated mutations
The competent programmer hypothesis states that most programmers are
competent enough to create correct or almost correct source code. Because this
implies that bugs should usually manifest through small variations of the
correct code, the competent programmer hypothesis is one of the fundamental
assumptions of mutation testing. Unfortunately, it is still unclear if the
competent programmer hypothesis holds and past research presents contradictory
claims. Within this article, we provide a new perspective on the competent
programmer hypothesis and its relation to mutation testing. We try to re-create
real-world bugs through chains of mutations to understand if there is a direct
link between mutation testing and bugs. The lengths of these paths help us to
understand if the source code is really almost correct, or if large variations
are required. Our results indicate that while the competent programmer
hypothesis seems to be true, mutation testing is missing important operators to
generate representative real-world bugs.Comment: Submitted and under revie
The Cowl - Vol LNVII - n.1 - Sep 24,1992
The Cowl - student newspaper of Providence College. Volume LNVII - Number 1 - September 24, 1992. 24 pages
Studying the lives of software bugs
For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs.For as long as people have made software, they have made mistakes in that software. Software bugs are widespread, and the maintenance required to fix them has a major impact on the cost of software and how developers' time is spent. Reducing this maintenance time would lower the cost of software and allow for developers to spend more time on new features, improving the software for end-users. Bugs are hugely diverse and have a complex life cycle. This makes them difficult to study, and research is often carried out on synthetic bugs or toy programs. However, a better understanding of the bug life cycle would greatly aid in developing tools to reduce the time spent on maintenance. This thesis will study the life cycle of bugs, and develop such an understanding. Overall, this thesis examines over 3000 real bugs, from real projects, concentrating on three of the most important points in the life cycle: origin, reporting and fix. Firstly, two existing techniques are compared for discovering the origin of a bug. A number of improvements are evaluated, and the most effective approach is found to be combining the techniques. Furthermore, the behaviour of developers is found to have a major impact on the accuracy of the techniques. Secondly, a large number of bugs are analysed to determine what information is provided when users report bugs. For most bugs, much important information is missing, or inaccurate. Most importantly, there appears to be a considerable gap between what users provide and what developers actually want. Finally, an evaluation is carried out on a number of novel alterations to techniques used to determine the location of bug fixes. Compared to existing techniques, these alterations successfully increase the number of bugs which can be usefully localised, aiding developers in removing the bugs
Planning under time pressure
Heuristic search is a technique used pervasively in artificial intelligence and automated planning. Often an agent is given a task that it would like to solve as quickly as possible. It must allocate its time between planning the actions to achieve the task and actually executing them. We call this problem planning under time pressure. Most popular heuristic search algorithms are ill-suited for this setting, as they either search a lot to find short plans or search a little and find long plans. The thesis of this dissertation is: when under time pressure, an automated agent should explicitly attempt to minimize the sum of planning and execution times, not just one or just the other.
This dissertation makes four contributions. First we present new algorithms that use modern multi-core CPUs to decrease planning time without increasing execution. Second, we introduce a new model for predicting the performance of iterative-deepening search. The model is as accurate as previous offline techniques when using less training data, but can also be used online to reduce the overhead of iterative-deepening search, resulting in faster planning. Third we show offline planning algorithms that directly attempt to minimize the sum of planning and execution times. And, fourth we consider algorithms that plan online in parallel with execution. Both offline and online algorithms account for a user-specified preference between search and execution, and can greatly outperform the standard utility-oblivious techniques. By addressing the problem of planning under time pressure, these contributions demonstrate that heuristic search is no longer restricted to optimizing solution cost, obviating the need to choose between slow search times and expensive solutions
Comparing text-based and dependence-based approaches for determining the origins of bugs
Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness
- …