48,505 research outputs found
Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS
This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schemaless technique (XPath Accelerator). We investigate the maturity of probabilistic rela- tional databases for this task with experiments with one of the state-of- the-art systems, called Trio
Where do statistical models come from? Revisiting the problem of specification
R. A. Fisher founded modern statistical inference in 1922 and identified its
fundamental problems to be: specification, estimation and distribution. Since
then the problem of statistical model specification has received scant
attention in the statistics literature. The paper traces the history of
statistical model specification, focusing primarily on pioneers like Fisher,
Neyman, and more recently Lehmann and Cox, and attempts a synthesis of their
views in the context of the Probabilistic Reduction (PR) approach. As argued by
Lehmann [11], a major stumbling block for a general approach to statistical
model specification has been the delineation of the appropriate role for
substantive subject matter information. The PR approach demarcates the
interrelated but complemenatry roles of substantive and statistical information
summarized ab initio in the form of a structural and a statistical model,
respectively. In an attempt to preserve the integrity of both sources of
information, as well as to ensure the reliability of their fusing, a purely
probabilistic construal of statistical models is advocated. This probabilistic
construal is then used to shed light on a number of issues relating to
specification, including the role of preliminary data analysis, structural vs.
statistical models, model specification vs. model selection, statistical vs.
substantive adequacy and model validation.Comment: Published at http://dx.doi.org/10.1214/074921706000000419 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Refinement for Probabilistic Systems with Nondeterminism
Before we combine actions and probabilities two very obvious questions should
be asked. Firstly, what does "the probability of an action" mean? Secondly, how
does probability interact with nondeterminism? Neither question has a single
universally agreed upon answer but by considering these questions at the outset
we build a novel and hopefully intuitive probabilistic event-based formalism.
In previous work we have characterised refinement via the notion of testing.
Basically, if one system passes all the tests that another system passes (and
maybe more) we say the first system is a refinement of the second. This is, in
our view, an important way of characterising refinement, via the question "what
sort of refinement should I be using?"
We use testing in this paper as the basis for our refinement. We develop
tests for probabilistic systems by analogy with the tests developed for
non-probabilistic systems. We make sure that our probabilistic tests, when
performed on non-probabilistic automata, give us refinement relations which
agree with for those non-probabilistic automata. We formalise this property as
a vertical refinement.Comment: In Proceedings Refine 2011, arXiv:1106.348
Game Characterization of Probabilistic Bisimilarity, and Applications to Pushdown Automata
We study the bisimilarity problem for probabilistic pushdown automata (pPDA)
and subclasses thereof. Our definition of pPDA allows both probabilistic and
non-deterministic branching, generalising the classical notion of pushdown
automata (without epsilon-transitions). We first show a general
characterization of probabilistic bisimilarity in terms of two-player games,
which naturally reduces checking bisimilarity of probabilistic labelled
transition systems to checking bisimilarity of standard (non-deterministic)
labelled transition systems. This reduction can be easily implemented in the
framework of pPDA, allowing to use known results for standard
(non-probabilistic) PDA and their subclasses. A direct use of the reduction
incurs an exponential increase of complexity, which does not matter in deriving
decidability of bisimilarity for pPDA due to the non-elementary complexity of
the problem. In the cases of probabilistic one-counter automata (pOCA), of
probabilistic visibly pushdown automata (pvPDA), and of probabilistic basic
process algebras (i.e., single-state pPDA) we show that an implicit use of the
reduction can avoid the complexity increase; we thus get PSPACE, EXPTIME, and
2-EXPTIME upper bounds, respectively, like for the respective non-probabilistic
versions. The bisimilarity problems for OCA and vPDA are known to have matching
lower bounds (thus being PSPACE-complete and EXPTIME-complete, respectively);
we show that these lower bounds also hold for fully probabilistic versions that
do not use non-determinism
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
- …