124 research outputs found
Subsampling MCMC - An introduction for the survey statistician
The rapid development of computing power and efficient Markov Chain Monte
Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics,
making it a highly practical inference method in applied work. However, MCMC
algorithms tend to be computationally demanding, and are particularly slow for
large datasets. Data subsampling has recently been suggested as a way to make
MCMC methods scalable on massively large data, utilizing efficient sampling
schemes and estimators from the survey sampling literature. These developments
tend to be unknown by many survey statisticians who traditionally work with
non-Bayesian methods, and rarely use MCMC. Our article explains the idea of
data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a
so called pseudo-marginal MCMC approach to speeding up MCMC through data
subsampling. The review is written for a survey statistician without previous
knowledge of MCMC methods since our aim is to motivate survey sampling experts
to contribute to the growing Subsampling MCMC literature.Comment: Accepted for publication in Sankhya A. Previous uploaded version
contained a bug in generating the figures and reference
Automatic Differentiation Variational Inference
Probabilistic modeling is iterative. A scientist posits a simple model, fits
it to her data, refines it according to her analysis, and repeats. However,
fitting complex models to large data is a bottleneck in this process. Deriving
algorithms for new models can be both mathematically and computationally
challenging, which makes it difficult to efficiently cycle through the steps.
To this end, we develop automatic differentiation variational inference (ADVI).
Using our method, the scientist only provides a probabilistic model and a
dataset, nothing else. ADVI automatically derives an efficient variational
inference algorithm, freeing the scientist to refine and explore many models.
ADVI supports a broad class of models-no conjugacy assumptions are required. We
study ADVI across ten different models and apply it to a dataset with millions
of observations. ADVI is integrated into Stan, a probabilistic programming
system; it is available for immediate use
A Survey of Bayesian Statistical Approaches for Big Data
The modern era is characterised as an era of information or Big Data. This
has motivated a huge literature on new methods for extracting information and
insights from these data. A natural question is how these approaches differ
from those that were available prior to the advent of Big Data. We present a
review of published studies that present Bayesian statistical approaches
specifically for Big Data and discuss the reported and perceived benefits of
these approaches. We conclude by addressing the question of whether focusing
only on improving computational algorithms and infrastructure will be enough to
face the challenges of Big Data
Piecewise Deterministic Markov Processes for Bayesian Neural Networks
Inference on modern Bayesian Neural Networks (BNNs) often relies on a
variational inference treatment, imposing violated assumptions of independence
and the form of the posterior. Traditional MCMC approaches avoid these
assumptions at the cost of increased computation due to its incompatibility to
subsampling of the likelihood. New Piecewise Deterministic Markov Process
(PDMP) samplers permit subsampling, though introduce a model specific
inhomogenous Poisson Process (IPPs) which is difficult to sample from. This
work introduces a new generic and adaptive thinning scheme for sampling from
these IPPs, and demonstrates how this approach can accelerate the application
of PDMPs for inference in BNNs. Experimentation illustrates how inference with
these methods is computationally feasible, can improve predictive accuracy,
MCMC mixing performance, and provide informative uncertainty measurements when
compared against other approximate inference schemes.Comment: Includes correction to software and corrigendum not
Sub-sampled and Differentially Private Hamiltonian Monte Carlo
Hamiltonian Monte Carlo is a powerful Markov Chain algorithm, which is able to traverse complex posterior distributions accurately. One of the method's disadvantages is it's reliance on gradient evaluations over the full data, which quickly becomes computationally costly when the data sets grow large. By mini-batching the data set for stochastic gradient approximations we can speed up the algorithm, albeit with a reduced posterior accuracy. We illustrate by using a toy example, that the stochastic version of the method is unable to explore the exact posterior, and we show how an added friction term greatly alleviates this, when the term is adjusted carefully.
We use the added stochastic error to our advantage, by turning the results differentially private. The randomness in the results masks the appearance of any single data point in the used data set, creating a way to more secure handling of sensitive data. In the case of stochastic gradient Hamiltonian Monte Carlo, we are able to achieve reasonable privacy bounds with little to no decrease in optimization performance, although finding a good the differentially private approximation of the target posterior becomes harder. In addition, we compare the previously considered privacy accounting methods to assay the privacy bounds to a new privacy loss distribution method, which is able to determine a tighter privacy profile than, for example, the moments accountant method
Approximate blocked Gibbs sampling for Bayesian neural networks
In this work, minibatch MCMC sampling for feedforward neural networks is made
more feasible. To this end, it is proposed to sample subgroups of parameters
via a blocked Gibbs sampling scheme. By partitioning the parameter space,
sampling is possible irrespective of layer width. It is also possible to
alleviate vanishing acceptance rates for increasing depth by reducing the
proposal variance in deeper layers. Increasing the length of a non-convergent
chain increases the predictive accuracy in classification tasks, so avoiding
vanishing acceptance rates and consequently enabling longer chain runs have
practical benefits. Moreover, non-convergent chain realizations aid in the
quantification of predictive uncertainty. An open problem is how to perform
minibatch MCMC sampling for feedforward neural networks in the presence of
augmented data
- …