106 research outputs found
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Stochastic Gradient Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for
defining distant proposals with high acceptance probabilities in a
Metropolis-Hastings framework, enabling more efficient exploration of the state
space than standard random-walk proposals. The popularity of such methods has
grown significantly in recent years. However, a limitation of HMC methods is
the required gradient computation for simulation of the Hamiltonian dynamical
system-such computation is infeasible in problems involving a large sample size
or streaming data. Instead, we must rely on a noisy gradient estimate computed
from a subset of the data. In this paper, we explore the properties of such a
stochastic gradient HMC approach. Surprisingly, the natural implementation of
the stochastic approximation can be arbitrarily bad. To address this problem we
introduce a variant that uses second-order Langevin dynamics with a friction
term that counteracts the effects of the noisy gradient, maintaining the
desired target distribution as the invariant distribution. Results on simulated
data validate our theory. We also provide an application of our methods to a
classification task using neural networks and to online Bayesian matrix
factorization.Comment: ICML 2014 versio
Minibatch Markov chain Monte Carlo Algorithms for Fitting Gaussian Processes
Gaussian processes (GPs) are a highly flexible, nonparametric statistical
model that are commonly used to fit nonlinear relationships or account for
correlation between observations. However, the computational load of fitting a
Gaussian process is making them infeasible for use on large
datasets. To make GPs more feasible for large datasets, this research focuses
on the use of minibatching to estimate GP parameters. Specifically, we outline
both approximate and exact minibatch Markov chain Monte Carlo algorithms that
substantially reduce the computation of fitting a GP by only considering small
subsets of the data at a time. We demonstrate and compare this methodology
using various simulations and real datasets.Comment: 19 Pages, 6 Figure
Variational Hamiltonian Monte Carlo via Score Matching
Traditionally, the field of computational Bayesian statistics has been
divided into two main subfields: variational methods and Markov chain Monte
Carlo (MCMC). In recent years, however, several methods have been proposed
based on combining variational Bayesian inference and MCMC simulation in order
to improve their overall accuracy and computational efficiency. This marriage
of fast evaluation and flexible approximation provides a promising means of
designing scalable Bayesian inference methods. In this paper, we explore the
possibility of incorporating variational approximation into a state-of-the-art
MCMC method, Hamiltonian Monte Carlo (HMC), to reduce the required gradient
computation in the simulation of Hamiltonian flow, which is the bottleneck for
many applications of HMC in big data problems. To this end, we use a {\it
free-form} approximation induced by a fast and flexible surrogate function
based on single-hidden layer feedforward neural networks. The surrogate
provides sufficiently accurate approximation while allowing for fast
exploration of parameter space, resulting in an efficient approximate inference
algorithm. We demonstrate the advantages of our method on both synthetic and
real data problems
- …