276 research outputs found

    Learning Tractable Probabilistic Models for Fault Localization

    Full text link
    In recent years, several probabilistic techniques have been applied to various debugging problems. However, most existing probabilistic debugging systems use relatively simple statistical models, and fail to generalize across multiple programs. In this work, we propose Tractable Fault Localization Models (TFLMs) that can be learned from data, and probabilistically infer the location of the bug. While most previous statistical debugging methods generalize over many executions of a single program, TFLMs are trained on a corpus of previously seen buggy programs, and learn to identify recurring patterns of bugs. Widely-used fault localization techniques such as TARANTULA evaluate the suspiciousness of each line in isolation; in contrast, a TFLM defines a joint probability distribution over buggy indicator variables for each line. Joint distributions with rich dependency structure are often computationally intractable; TFLMs avoid this by exploiting recent developments in tractable probabilistic models (specifically, Relational SPNs). Further, TFLMs can incorporate additional sources of information, including coverage-based features such as TARANTULA. We evaluate the fault localization performance of TFLMs that include TARANTULA scores as features in the probabilistic model. Our study shows that the learned TFLMs isolate bugs more effectively than previous statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI 2015

    Spectrum-based Fault Localization Techniques Application on Multiple-Fault Programs: A Review

    Get PDF
    Software fault localization is one of the most tedious and costly activities in program debugging in the endeavor to identify faults locations in a software program. In this paper, the studies that used spectrum-based fault localization (SBFL) techniques that makes use of different multiple fault localization debugging methods such as one-bug-at-a-time (OBA) debugging, parallel debugging, and simultaneous debugging in localizing multiple faults are classified and critically analyzed in order to extensively discuss the current research trends, issues, and challenges in this field of study. The outcome strongly shows that there is a high utilization of OBA debugging method, poor fault isolation accuracy, and dominant use of artificial faults that limit the existing techniques applicability in the software industry

    A Comprehensive Empirical Investigation on Failure Clustering in Parallel Debugging

    Full text link
    The clustering technique has attracted a lot of attention as a promising strategy for parallel debugging in multi-fault scenarios, this heuristic approach (i.e., failure indexing or fault isolation) enables developers to perform multiple debugging tasks simultaneously through dividing failed test cases into several disjoint groups. When using statement ranking representation to model failures for better clustering, several factors influence clustering effectiveness, including the risk evaluation formula (REF), the number of faults (NOF), the fault type (FT), and the number of successful test cases paired with one individual failed test case (NSP1F). In this paper, we present the first comprehensive empirical study of how these four factors influence clustering effectiveness. We conduct extensive controlled experiments on 1060 faulty versions of 228 simulated faults and 141 real faults, and the results reveal that: 1) GP19 is highly competitive across all REFs, 2) clustering effectiveness decreases as NOF increases, 3) higher clustering effectiveness is easier to achieve when a program contains only predicate faults, and 4) clustering effectiveness remains when the scale of NSP1F is reduced to 20%

    Doctor of Philosophy

    Get PDF
    dissertationAggressive random testing tools, or fuzzers, are impressively effective at finding bugs in compilers and programming language runtimes. For example, a single test-case generator has resulted in more than 460 bugs reported for a number of production-quality C compilers. However, fuzzers can be hard to use. The first problem is that failures triggered by random test cases can be difficult to debug because these tests are often large. To report a compiler bug, one must often construct a small test case that triggers the bug. The existing automated test-case reduction technique, delta debugging, is not sufficient to produce small, reportable test cases. A second problem is that fuzzers are indiscriminate: they repeatedly find bugs that may not be severe enough to fix right away. Third, fuzzers tend to generate a large number of test cases that only trigger a few bugs. Some bugs are triggered much more frequently than others, creating needle-in-the-haystack problems. Currently, users rule out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and filtering test results. This dissertation investigates approaches to improving the utility of compiler fuzzers. Two components, an aggressive test-case reducer and a tamer, are added to the fuzzing workflow to make the fuzzer more user friendly. We introduce C-Reduce, an aggressive test-case reducer for C/C++ programs, which exploits rich domain-specific knowledge to output test cases nearly as good as those produced by skilled humans. This reducer produces outputs that are, on average, more than 30 times smaller than those produced by the existing reducer that is most commonly used by compiler engineers. Second, this dissertation formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Bug triage can be effectively automated, relying on techniques from machine learning to suppress duplicate bug-triggering test cases and test cases triggering known bugs. An evaluation shows the ability of this tool to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler

    SURE: A Visualized Failure Indexing Approach using Program Memory Spectrum

    Full text link
    Failure indexing is a longstanding crux in software testing and debugging, the goal of which is to automatically divide failures (e.g., failed test cases) into distinct groups according to the culprit root causes, as such multiple faults in a faulty program can be handled independently and simultaneously. This community has long been plagued by two challenges: 1) The effectiveness of division is still far from promising. Existing techniques only employ a limited source of run-time data (e.g., code coverage) to be failure proximity, which typically delivers unsatisfactory results. 2) The outcome can be hardly comprehensible. A developer who receives the failure indexing result does not know why all failures should be divided the way they are. This leads to difficulties for developers to be convinced by the result, which in turn affects the adoption of the results. To tackle these challenges, in this paper, we propose SURE, a viSUalized failuRe indExing approach using the program memory spectrum. We first collect the run-time memory information at preset breakpoints during the execution of failed test cases, and transform it into human-friendly images (called program memory spectrum, PMS). Then, any pair of PMS images that serve as proxies for two failures is fed to a trained Siamese convolutional neural network, to predict the likelihood of them being triggered by the same fault. Results demonstrate the effectiveness of SURE: It achieves 101.20% and 41.38% improvements in faults number estimation, as well as 105.20% and 35.53% improvements in clustering, compared with the state-of-the-art technique in this field, in simulated and real-world environments, respectively. Moreover, we carry out a human study to quantitatively evaluate the comprehensibility of PMS, revealing that this novel type of representation can help developers better comprehend failure indexing results.Comment: Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF fil

    Automatically Discovering, Reporting and Reproducing Android Application Crashes

    Full text link
    Mobile developers face unique challenges when detecting and reporting crashes in apps due to their prevailing GUI event-driven nature and additional sources of inputs (e.g., sensor readings). To support developers in these tasks, we introduce a novel, automated approach called CRASHSCOPE. This tool explores a given Android app using systematic input generation, according to several strategies informed by static and dynamic analyses, with the intrinsic goal of triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented crash report containing screenshots, detailed crash reproduction steps, the captured exception stack trace, and a fully replayable script that automatically reproduces the crash on a target device(s). We evaluated CRASHSCOPE's effectiveness in discovering crashes as compared to five state-of-the-art Android input generation tools on 61 applications. The results demonstrate that CRASHSCOPE performs about as well as current tools for detecting crashes and provides more detailed fault information. Additionally, in a study analyzing eight real-world Android app crashes, we found that CRASHSCOPE's reports are easily readable and allow for reliable reproduction of crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on Software Testing, Verification and Validation (ICST'16), Chicago, IL, April 10-15, 2016, pp. 33-4
    • …
    corecore