276 research outputs found
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
Spectrum-based Fault Localization Techniques Application on Multiple-Fault Programs: A Review
Software fault localization is one of the most tedious and costly activities in program debugging in the endeavor to identify faults locations in a software program. In this paper, the studies that used spectrum-based fault localization (SBFL) techniques that makes use of different multiple fault localization debugging methods such as one-bug-at-a-time (OBA) debugging, parallel debugging, and simultaneous debugging in localizing multiple faults are classified and critically analyzed in order to extensively discuss the current research trends, issues, and challenges in this field of study. The outcome strongly shows that there is a high utilization of OBA debugging method, poor fault isolation accuracy, and dominant use of artificial faults that limit the existing techniques applicability in the software industry
A Comprehensive Empirical Investigation on Failure Clustering in Parallel Debugging
The clustering technique has attracted a lot of attention as a promising
strategy for parallel debugging in multi-fault scenarios, this heuristic
approach (i.e., failure indexing or fault isolation) enables developers to
perform multiple debugging tasks simultaneously through dividing failed test
cases into several disjoint groups. When using statement ranking representation
to model failures for better clustering, several factors influence clustering
effectiveness, including the risk evaluation formula (REF), the number of
faults (NOF), the fault type (FT), and the number of successful test cases
paired with one individual failed test case (NSP1F). In this paper, we present
the first comprehensive empirical study of how these four factors influence
clustering effectiveness. We conduct extensive controlled experiments on 1060
faulty versions of 228 simulated faults and 141 real faults, and the results
reveal that: 1) GP19 is highly competitive across all REFs, 2) clustering
effectiveness decreases as NOF increases, 3) higher clustering effectiveness is
easier to achieve when a program contains only predicate faults, and 4)
clustering effectiveness remains when the scale of NSP1F is reduced to 20%
Doctor of Philosophy
dissertationAggressive random testing tools, or fuzzers, are impressively effective at finding bugs in compilers and programming language runtimes. For example, a single test-case generator has resulted in more than 460 bugs reported for a number of production-quality C compilers. However, fuzzers can be hard to use. The first problem is that failures triggered by random test cases can be difficult to debug because these tests are often large. To report a compiler bug, one must often construct a small test case that triggers the bug. The existing automated test-case reduction technique, delta debugging, is not sufficient to produce small, reportable test cases. A second problem is that fuzzers are indiscriminate: they repeatedly find bugs that may not be severe enough to fix right away. Third, fuzzers tend to generate a large number of test cases that only trigger a few bugs. Some bugs are triggered much more frequently than others, creating needle-in-the-haystack problems. Currently, users rule out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and filtering test results. This dissertation investigates approaches to improving the utility of compiler fuzzers. Two components, an aggressive test-case reducer and a tamer, are added to the fuzzing workflow to make the fuzzer more user friendly. We introduce C-Reduce, an aggressive test-case reducer for C/C++ programs, which exploits rich domain-specific knowledge to output test cases nearly as good as those produced by skilled humans. This reducer produces outputs that are, on average, more than 30 times smaller than those produced by the existing reducer that is most commonly used by compiler engineers. Second, this dissertation formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Bug triage can be effectively automated, relying on techniques from machine learning to suppress duplicate bug-triggering test cases and test cases triggering known bugs. An evaluation shows the ability of this tool to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler
SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum
Failure indexing is a longstanding crux in software testing and debugging,
the goal of which is to automatically divide failures (e.g., failed test cases)
into distinct groups according to the culprit root causes, as such multiple
faults in a faulty program can be handled independently and simultaneously.
This community has long been plagued by two challenges: 1) The effectiveness of
division is still far from promising. Existing techniques only employ a limited
source of run-time data (e.g., code coverage) to be failure proximity, which
typically delivers unsatisfactory results. 2) The outcome can be hardly
comprehensible. A developer who receives the failure indexing result does not
know why all failures should be divided the way they are. This leads to
difficulties for developers to be convinced by the result, which in turn
affects the adoption of the results. To tackle these challenges, in this paper,
we propose SURE, a viSUalized failuRe indExing approach using the program
memory spectrum. We first collect the run-time memory information at preset
breakpoints during the execution of failed test cases, and transform it into
human-friendly images (called program memory spectrum, PMS). Then, any pair of
PMS images that serve as proxies for two failures is fed to a trained Siamese
convolutional neural network, to predict the likelihood of them being triggered
by the same fault. Results demonstrate the effectiveness of SURE: It achieves
101.20% and 41.38% improvements in faults number estimation, as well as 105.20%
and 35.53% improvements in clustering, compared with the state-of-the-art
technique in this field, in simulated and real-world environments,
respectively. Moreover, we carry out a human study to quantitatively evaluate
the comprehensibility of PMS, revealing that this novel type of representation
can help developers better comprehend failure indexing results.Comment: Due to the limitation "The abstract field cannot be longer than 1,920
characters", the abstract here is shorter than that in the PDF fil
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Mobile developers face unique challenges when detecting and reporting crashes
in apps due to their prevailing GUI event-driven nature and additional sources
of inputs (e.g., sensor readings). To support developers in these tasks, we
introduce a novel, automated approach called CRASHSCOPE. This tool explores a
given Android app using systematic input generation, according to several
strategies informed by static and dynamic analyses, with the intrinsic goal of
triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented
crash report containing screenshots, detailed crash reproduction steps, the
captured exception stack trace, and a fully replayable script that
automatically reproduces the crash on a target device(s). We evaluated
CRASHSCOPE's effectiveness in discovering crashes as compared to five
state-of-the-art Android input generation tools on 61 applications. The results
demonstrate that CRASHSCOPE performs about as well as current tools for
detecting crashes and provides more detailed fault information. Additionally,
in a study analyzing eight real-world Android app crashes, we found that
CRASHSCOPE's reports are easily readable and allow for reliable reproduction of
crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on
Software Testing, Verification and Validation (ICST'16), Chicago, IL, April
10-15, 2016, pp. 33-4
- …