362 research outputs found
Stan: A Probabilistic Programming Language
Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting
Natively probabilistic computation
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (leaves 129-135).I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems. I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a popular form of backtracking search to the broader setting of stochastic simulation and recovers widely used particle filters as a special case. I use it to solve probabilistic reasoning problems from statistical physics, causal reasoning and stereo vision. Finally, I introduce stochastic digital circuits that model the probability algebra just as traditional Boolean circuits model the Boolean algebra.(cont.) I show how these circuits can be used to build massively parallel, fault-tolerant machines for sampling and allow one to efficiently run Markov chain Monte Carlo methods on models with hundreds of thousands of variables in real time. I emphasize the ways in which these ideas fit together into a coherent software and hardware stack for natively probabilistic computing, organized around distributions and samplers rather than deterministic functions. I argue that by building uncertainty and randomness into the foundations of our programming languages and computing machines, we may arrive at ones that are more powerful, flexible and efficient than deterministic designs, and are in better alignment with the needs of computational science, statistics and artificial intelligence.by Vikash Kumar Mansinghka.Ph.D
Blang: Bayesian declarative modelling of general data structures and inference via algorithms based on distribution continua
Consider a Bayesian inference problem where a variable of interest does not
take values in a Euclidean space. These "non-standard" data structures are in
reality fairly common. They are frequently used in problems involving latent
discrete factor models, networks, and domain specific problems such as sequence
alignments and reconstructions, pedigrees, and phylogenies. In principle,
Bayesian inference should be particularly well-suited in such scenarios, as the
Bayesian paradigm provides a principled way to obtain confidence assessment for
random variables of any type. However, much of the recent work on making
Bayesian analysis more accessible and computationally efficient has focused on
inference in Euclidean spaces.
In this paper, we introduce Blang, a domain specific language and library
aimed at bridging this gap. Blang allows users to perform Bayesian analysis on
arbitrary data types while using a declarative syntax similar to BUGS. Blang is
augmented with intuitive language additions to create data types of the user's
choosing. To perform inference at scale on such arbitrary state spaces, Blang
leverages recent advances in sequential Monte Carlo and non-reversible Markov
chain Monte Carlo methods
Towards derandomising Markov chain Monte Carlo
We present a new framework to derandomise certain Markov chain Monte Carlo
(MCMC) algorithms.
As in MCMC, we first reduce counting problems to sampling from a sequence of
marginal distributions.
For the latter task,
we introduce a method called coupling towards the past that can, in
logarithmic time,
evaluate one or a constant number of variables from a stationary Markov chain
state.
Since there are at most logarithmic random choices, this leads to very simple
derandomisation.
We provide two applications of this framework, namely efficient deterministic
approximate counting algorithms for hypergraph independent sets and hypergraph
colourings,
under local lemma type conditions matching, up to lower order factors, their
state-of-the-art randomised counterparts.Comment: 57 page
Solving Satisfiability Modulo Counting for Symbolic and Statistical AI Integration With Provable Guarantees
Satisfiability Modulo Counting (SMC) encompasses problems that require both
symbolic decision-making and statistical reasoning. Its general formulation
captures many real-world problems at the intersection of symbolic and
statistical Artificial Intelligence. SMC searches for policy interventions to
control probabilistic outcomes. Solving SMC is challenging because of its
highly intractable nature(-complete), incorporating
statistical inference and symbolic reasoning. Previous research on SMC solving
lacks provable guarantees and/or suffers from sub-optimal empirical
performance, especially when combinatorial constraints are present. We propose
XOR-SMC, a polynomial algorithm with access to NP-oracles, to solve highly
intractable SMC problems with constant approximation guarantees. XOR-SMC
transforms the highly intractable SMC into satisfiability problems, by
replacing the model counting in SMC with SAT formulae subject to randomized XOR
constraints. Experiments on solving important SMC problems in AI for social
good demonstrate that XOR-SMC finds solutions close to the true optimum,
outperforming several baselines which struggle to find good approximations for
the intractable model counting in SMC
The Complexity of Approximately Counting Tree Homomorphisms
We study two computational problems, parameterised by a fixed tree H.
#HomsTo(H) is the problem of counting homomorphisms from an input graph G to H.
#WHomsTo(H) is the problem of counting weighted homomorphisms to H, given an
input graph G and a weight function for each vertex v of G. Even though H is a
tree, these problems turn out to be sufficiently rich to capture all of the
known approximation behaviour in #P. We give a complete trichotomy for
#WHomsTo(H). If H is a star then #WHomsTo(H) is in FP. If H is not a star but
it does not contain a certain induced subgraph J_3 then #WHomsTo(H) is
equivalent under approximation-preserving (AP) reductions to #BIS, the problem
of counting independent sets in a bipartite graph. This problem is complete for
the class #RHPi_1 under AP-reductions. Finally, if H contains an induced J_3
then #WHomsTo(H) is equivalent under AP-reductions to #SAT, the problem of
counting satisfying assignments to a CNF Boolean formula. Thus, #WHomsTo(H) is
complete for #P under AP-reductions. The results are similar for #HomsTo(H)
except that a rich structure emerges if H contains an induced J_3. We show that
there are trees H for which #HomsTo(H) is #SAT-equivalent (disproving a
plausible conjecture of Kelk). There is an interesting connection between these
homomorphism-counting problems and the problem of approximating the partition
function of the ferromagnetic Potts model. In particular, we show that for a
family of graphs J_q, parameterised by a positive integer q, the problem
#HomsTo(H) is AP-interreducible with the problem of approximating the partition
function of the q-state Potts model. It was not previously known that the Potts
model had a homomorphism-counting interpretation. We use this connection to
obtain some additional upper bounds for the approximation complexity of
#HomsTo(J_q)
- …