5,657 research outputs found
Dirichlet Fragmentation Processes
Tree structures are ubiquitous in data across many domains, and many datasets
are naturally modelled by unobserved tree structures. In this paper, first we
review the theory of random fragmentation processes [Bertoin, 2006], and a
number of existing methods for modelling trees, including the popular nested
Chinese restaurant process (nCRP). Then we define a general class of
probability distributions over trees: the Dirichlet fragmentation process (DFP)
through a novel combination of the theory of Dirichlet processes and random
fragmentation processes. This DFP presents a stick-breaking construction, and
relates to the nCRP in the same way the Dirichlet process relates to the
Chinese restaurant process. Furthermore, we develop a novel hierarchical
mixture model with the DFP, and empirically compare the new model to similar
models in machine learning. Experiments show the DFP mixture model to be
convincingly better than existing state-of-the-art approaches for hierarchical
clustering and density modelling
Lifted Variable Elimination: A Novel Operator and Completeness Results
Various methods for lifted probabilistic inference have been proposed, but
our understanding of these methods and the relationships between them is still
limited, compared to their propositional counterparts. The only existing
theoretical characterization of lifting is for weighted first-order model
counting (WFOMC), which was shown to be complete domain-lifted for the class of
2-logvar models. This paper makes two contributions to lifted variable
elimination (LVE). First, we introduce a novel inference operator called group
inversion. Second, we prove that LVE augmented with this operator is complete
in the same sense as WFOMC
Probabilistic Software Modeling
Software Engineering and the implementation of software has become a
challenging task as many tools, frameworks and languages must be orchestrated
into one functioning piece. This complexity increases the need for testing and
analysis methodologies that aid the developers and engineers as the software
grows and evolves. The amount of resources that companies budget for testing
and analysis is limited, highlighting the importance of automation for economic
software development. We propose Probabilistic Software Modeling, a new
paradigm for software modeling that builds on the fact that software is an
easy-to-monitor environment from which statistical models can be built.
Probabilistic Software Modeling provides increased comprehension for engineers
without changing the level of abstraction. The approach relies on the recursive
decomposition principle of object-oriented programming to build hierarchies of
probabilistic models that are fitted via observations collected at runtime of a
software system. This leads to a network of models that mirror the static
structure of the software system while modeling its dynamic runtime behavior.
The resulting models can be used in applications such as test-case generation,
anomaly and outlier detection, probabilistic program simulation, or state
predictions. Ideally, probabilistic software modeling allows the use of the
entire spectrum of statistical modeling and inference for software, enabling
in-depth analysis and generative procedures for software.Comment: 10 pages, 5 figures, accepted at ISSTA and ECOOP Doctoral Symposium
201
Recurrent Predictive State Policy Networks
We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent
architecture that brings insights from predictive state representations to
reinforcement learning in partially observable environments. Predictive state
policy networks consist of a recursive filter, which keeps track of a belief
about the state of the environment, and a reactive policy that directly maps
beliefs to actions, to maximize the cumulative reward. The recursive filter
leverages predictive state representations (PSRs) (Rosencrantz and Gordon,
2004; Sun et al., 2016) by modeling predictive state-- a prediction of the
distribution of future observations conditioned on history and future actions.
This representation gives rise to a rich class of statistically consistent
algorithms (Hefny et al., 2018) to initialize the recursive filter. Predictive
state serves as an equivalent representation of a belief state. Therefore, the
policy component of the RPSP-network can be purely reactive, simplifying
training while still allowing optimal behaviour. Moreover, we use the PSR
interpretation during training as well, by incorporating prediction error in
the loss function. The entire network (recursive filter and reactive policy) is
still differentiable and can be trained using gradient based methods. We
optimize our policy using a combination of policy gradient based on rewards
(Williams, 1992) and gradient descent based on prediction error. We show the
efficacy of RPSP-networks under partial observability on a set of robotic
control tasks from OpenAI Gym. We empirically show that RPSP-networks perform
well compared with memory-preserving networks such as GRUs, as well as finite
memory models, being the overall best performing method
Conditionally Independent Multiresolution Gaussian Processes
The multiresolution Gaussian process (GP) has gained increasing attention as
a viable approach towards improving the quality of approximations in GPs that
scale well to large-scale data. Most of the current constructions assume full
independence across resolutions. This assumption simplifies the inference, but
it underestimates the uncertainties in transitioning from one resolution to
another. This in turn results in models which are prone to overfitting in the
sense of excessive sensitivity to the chosen resolution, and predictions which
are non-smooth at the boundaries. Our contribution is a new construction which
instead assumes conditional independence among GPs across resolutions. We show
that relaxing the full independence assumption enables robustness against
overfitting, and that it delivers predictions that are smooth at the
boundaries. Our new model is compared against current state of the art on 2
synthetic and 9 real-world datasets. In most cases, our new conditionally
independent construction performed favorably when compared against models based
on the full independence assumption. In particular, it exhibits little to no
signs of overfitting
YGGDRASIL - A Statistical Package for Learning Split Models
There are two main objectives of this paper. The first is to present a
statistical framework for models with context specific independence structures,
i.e., conditional independences holding only for sepcific values of the
conditioning variables. This framework is constituted by the class of split
models. Split models are extension of graphical models for contigency tables
and allow for a more sophisticiated modelling than graphical models. The
treatment of split models include estimation, representation and a Markov
property for reading off those independencies holding in a specific context.
The second objective is to present a software package named YGGDRASIL which is
designed for statistical inference in split models, i.e., for learning such
models on the basis of data.Comment: Appears in Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (UAI2000
Exploiting Uniform Assignments in First-Order MPE
The MPE (Most Probable Explanation) query plays an important role in
probabilistic inference. MPE solution algorithms for probabilistic relational
models essentially adapt existing belief assessment method, replacing summation
with maximization. But the rich structure and symmetries captured by relational
models together with the properties of the maximization operator offer an
opportunity for additional simplification with potentially significant
computational ramifications. Specifically, these models often have groups of
variables that define symmetric distributions over some population of formulas.
The maximizing choice for different elements of this group is the same. If we
can realize this ahead of time, we can significantly reduce the size of the
model by eliminating a potentially significant portion of random variables.
This paper defines the notion of uniformly assigned and partially uniformly
assigned sets of variables, shows how one can recognize these sets efficiently,
and how the model can be greatly simplified once we recognize them, with little
computational effort. We demonstrate the effectiveness of these ideas
empirically on a number of models.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
Statistical Inference for the Population Landscape via Moment Adjusted Stochastic Gradients
Modern statistical inference tasks often require iterative optimization
methods to compute the solution. Convergence analysis from an optimization
viewpoint only informs us how well the solution is approximated numerically but
overlooks the sampling nature of the data. In contrast, recognizing the
randomness in the data, statisticians are keen to provide uncertainty
quantification, or confidence, for the solution obtained using iterative
optimization methods. This paper makes progress along this direction by
introducing the moment-adjusted stochastic gradient descents, a new stochastic
optimization method for statistical inference. We establish non-asymptotic
theory that characterizes the statistical distribution for certain iterative
methods with optimization guarantees. On the statistical front, the theory
allows for model mis-specification, with very mild conditions on the data. For
optimization, the theory is flexible for both convex and non-convex cases.
Remarkably, the moment-adjusting idea motivated from "error standardization" in
statistics achieves a similar effect as acceleration in first-order
optimization methods used to fit generalized linear models. We also demonstrate
this acceleration effect in the non-convex setting through numerical
experiments.Comment: Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 2019, to appea
Local Conditioning: Exact Message Passing for Cyclic Undirected Distributed Networks
This paper addresses practical implementation of summing out, expanding, and
reordering of messages in Local Conditioning (LC) for undirected networks. In
particular, incoming messages conditioned on potentially different subsets of
the receiving node's relevant set must be expanded to be conditioned on this
relevant set, then reordered so that corresponding columns of the conditioned
matrices can be fused through element-wise multiplication. An outgoing message
is then reduced by summing out loop cutset nodes that are upstream of the
outgoing edge. The emphasis on implementation is the primary contribution over
the theoretical justification of LC given in Fay et al. Nevertheless, the
complexity of Local Conditioning in grid networks is still no better than that
of Clustering.Comment: This work was presented at the Future Technologies Conference (FTC),
Vancouver, Canada, November 201
A Bayesian Model for Generative Transition-based Dependency Parsing
We propose a simple, scalable, fully generative model for transition-based
dependency parsing with high accuracy. The model, parameterized by Hierarchical
Pitman-Yor Processes, overcomes the limitations of previous generative models
by allowing fast and accurate inference. We propose an efficient decoding
algorithm based on particle filtering that can adapt the beam size to the
uncertainty in the model while jointly predicting POS tags and parse trees. The
UAS of the parser is on par with that of a greedy discriminative baseline. As a
language model, it obtains better perplexity than a n-gram model by performing
semi-supervised learning over a large unlabelled corpus. We show that the model
is able to generate locally and syntactically coherent sentences, opening the
door to further applications in language generation.Comment: Depling 201
- …