In this paper we address the following question: Can we approximately sample
from a Bayesian posterior distribution if we are only allowed to touch a small
mini-batch of data-items for every sample we generate?. An algorithm based on
the Langevin equation with stochastic gradients (SGLD) was previously proposed
to solve this, but its mixing rate was slow. By leveraging the Bayesian Central
Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it
will sample from a normal approximation of the posterior, while for slow mixing
rates it will mimic the behavior of SGLD with a pre-conditioner matrix. As a
bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic
gradients) and as such an efficient optimizer during burn-in.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012