92,489 research outputs found
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
SMCTC : sequential Monte Carlo in C++
Sequential Monte Carlo methods are a very general class of Monte Carlo methods for sampling from sequences of distributions. Simple examples of these algorithms are used very widely in the tracking and signal processing literature. Recent developments illustrate that these techniques have much more general applicability, and can be applied very effectively to statistical inference problems. Unfortunately, these methods are often perceived as being computationally expensive and difficult to implement. This article seeks to address both of these problems. A C++ template class library for the efficient and convenient implementation of very general Sequential Monte Carlo algorithms is presented. Two example applications are provided: a simple particle filter for illustrative purposes and a state-of-the-art algorithm for rare event estimation
Software systems through complex networks science: Review, analysis and applications
Complex software systems are among most sophisticated human-made systems, yet
only little is known about the actual structure of 'good' software. We here
study different software systems developed in Java from the perspective of
network science. The study reveals that network theory can provide a prominent
set of techniques for the exploratory analysis of large complex software
system. We further identify several applications in software engineering, and
propose different network-based quality indicators that address software
design, efficiency, reusability, vulnerability, controllability and other. We
also highlight various interesting findings, e.g., software systems are highly
vulnerable to processes like bug propagation, however, they are not easily
controllable
Simulation benchmarks for low-pressure plasmas: capacitive discharges
Benchmarking is generally accepted as an important element in demonstrating the correctness of computer simulations. In the modern sense, a benchmark is a computer simulation result that has evidence of correctness, is accompanied by estimates of relevant errors, and which can thus be used as a basis for judging the accuracy and efficiency of other codes. In this paper, we present four benchmark cases related to capacitively coupled discharges. These benchmarks prescribe all relevant physical and numerical parameters. We have simulated the benchmark conditions using five independently developed particle-in-cell codes. We show that the results of these simulations are statistically indistinguishable, within bounds of uncertainty that we define. We therefore claim that the results of these simulations represent strong benchmarks, that can be used as a basis for evaluating the accuracy of other codes. These other codes could include other approaches than particle-in-cell simulations, where benchmarking could examine not just implementation accuracy and efficiency, but also the fidelity of different physical models, such as moment or hybrid models. We discuss an example of this kind in an appendix. Of course, the methodology that we have developed can also be readily extended to a suite of benchmarks with coverage of a wider range of physical and chemical phenomena
Recommended from our members
Comparing the effectiveness of testing methods in improving programs: the effect of variations in program quality
We compare the efficacy of different testing methods for improving the reliability of software. Specifically, we use modelling to compare “operational” testing, in which test cases are chosen according to their probability of occurring in actual use of the software, against “debug” testing methods, in which the testers look for test cases which they consider likely to cause failure, or that satisfy some coverage criterion. We base our comparisons on the reliability reached by the program at the end of testing. Differently from previous studies, we consider the probability distribution of the achieved reliability, and thus the probability of satisfying specific requirements, rather than just the average reliability achieved. We take account of two sources of variation. The variation between the actual test histories that are possible for a given program and a given test method: and the fact that different programs start testing with different faults and initial reliability levels. By necessity, we use very simplified models of reality. Yet, we can show some interesting conclusions with important practical consequences. In general, there are stronger arguments in favor of operational testing than previous studies have show
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
- …