13,267 research outputs found
Smoothed Gradients for Stochastic Variational Inference
Stochastic variational inference (SVI) lets us scale up Bayesian computation
to massive data. It uses stochastic optimization to fit a variational
distribution, following easy-to-compute noisy natural gradients. As with most
traditional stochastic optimization methods, SVI takes precautions to use
unbiased stochastic gradients whose expectations are equal to the true
gradients. In this paper, we explore the idea of following biased stochastic
gradients in SVI. Our method replaces the natural gradient with a similarly
constructed vector that uses a fixed-window moving average of some of its
previous terms. We will demonstrate the many advantages of this technique.
First, its computational cost is the same as for SVI and storage requirements
only multiply by a constant factor. Second, it enjoys significant variance
reduction over the unbiased estimates, smaller bias than averaged gradients,
and leads to smaller mean-squared error against the full gradient. We test our
method on latent Dirichlet allocation with three large corpora.Comment: Appears in Neural Information Processing Systems, 201
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
- …