1,611 research outputs found
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
Classifying the Correctness of Generated White-Box Tests: An Exploratory Study
White-box test generator tools rely only on the code under test to select
test inputs, and capture the implementation's output as assertions. If there is
a fault in the implementation, it could get encoded in the generated tests.
Tool evaluations usually measure fault-detection capability using the number of
such fault-encoding tests. However, these faults are only detected, if the
developer can recognize that the encoded behavior is faulty. We designed an
exploratory study to investigate how developers perform in classifying
generated white-box test as faulty or correct. We carried out the study in a
laboratory setting with 54 graduate students. The tests were generated for two
open-source projects with the help of the IntelliTest tool. The performance of
the participants were analyzed using binary classification metrics and by
coding their observed activities. The results showed that participants
incorrectly classified a large number of both fault-encoding and correct tests
(with median misclassification rate 33% and 25% respectively). Thus the real
fault-detection capability of test generators could be much lower than
typically reported, and we suggest to take this human factor into account when
evaluating generated white-box tests.Comment: 13 pages, 7 figure
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
- …