92,489 research outputs found

    Learning Tractable Probabilistic Models for Fault Localization

    Full text link
    In recent years, several probabilistic techniques have been applied to various debugging problems. However, most existing probabilistic debugging systems use relatively simple statistical models, and fail to generalize across multiple programs. In this work, we propose Tractable Fault Localization Models (TFLMs) that can be learned from data, and probabilistically infer the location of the bug. While most previous statistical debugging methods generalize over many executions of a single program, TFLMs are trained on a corpus of previously seen buggy programs, and learn to identify recurring patterns of bugs. Widely-used fault localization techniques such as TARANTULA evaluate the suspiciousness of each line in isolation; in contrast, a TFLM defines a joint probability distribution over buggy indicator variables for each line. Joint distributions with rich dependency structure are often computationally intractable; TFLMs avoid this by exploiting recent developments in tractable probabilistic models (specifically, Relational SPNs). Further, TFLMs can incorporate additional sources of information, including coverage-based features such as TARANTULA. We evaluate the fault localization performance of TFLMs that include TARANTULA scores as features in the probabilistic model. Our study shows that the learned TFLMs isolate bugs more effectively than previous statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI 2015

    SMCTC : sequential Monte Carlo in C++

    Get PDF
    Sequential Monte Carlo methods are a very general class of Monte Carlo methods for sampling from sequences of distributions. Simple examples of these algorithms are used very widely in the tracking and signal processing literature. Recent developments illustrate that these techniques have much more general applicability, and can be applied very effectively to statistical inference problems. Unfortunately, these methods are often perceived as being computationally expensive and difficult to implement. This article seeks to address both of these problems. A C++ template class library for the efficient and convenient implementation of very general Sequential Monte Carlo algorithms is presented. Two example applications are provided: a simple particle filter for illustrative purposes and a state-of-the-art algorithm for rare event estimation

    Software systems through complex networks science: Review, analysis and applications

    Full text link
    Complex software systems are among most sophisticated human-made systems, yet only little is known about the actual structure of 'good' software. We here study different software systems developed in Java from the perspective of network science. The study reveals that network theory can provide a prominent set of techniques for the exploratory analysis of large complex software system. We further identify several applications in software engineering, and propose different network-based quality indicators that address software design, efficiency, reusability, vulnerability, controllability and other. We also highlight various interesting findings, e.g., software systems are highly vulnerable to processes like bug propagation, however, they are not easily controllable

    Simulation benchmarks for low-pressure plasmas: capacitive discharges

    Get PDF
    Benchmarking is generally accepted as an important element in demonstrating the correctness of computer simulations. In the modern sense, a benchmark is a computer simulation result that has evidence of correctness, is accompanied by estimates of relevant errors, and which can thus be used as a basis for judging the accuracy and efficiency of other codes. In this paper, we present four benchmark cases related to capacitively coupled discharges. These benchmarks prescribe all relevant physical and numerical parameters. We have simulated the benchmark conditions using five independently developed particle-in-cell codes. We show that the results of these simulations are statistically indistinguishable, within bounds of uncertainty that we define. We therefore claim that the results of these simulations represent strong benchmarks, that can be used as a basis for evaluating the accuracy of other codes. These other codes could include other approaches than particle-in-cell simulations, where benchmarking could examine not just implementation accuracy and efficiency, but also the fidelity of different physical models, such as moment or hybrid models. We discuss an example of this kind in an appendix. Of course, the methodology that we have developed can also be readily extended to a suite of benchmarks with coverage of a wider range of physical and chemical phenomena

    Curriculum Guidelines for Undergraduate Programs in Data Science

    Get PDF
    The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science
    corecore