Variational inference is a popular alternative to Markov chain Monte Carlo
methods that constructs a Bayesian posterior approximation by minimizing a
discrepancy to the true posterior within a pre-specified family. This converts
Bayesian inference into an optimization problem, enabling the use of simple and
scalable stochastic optimization algorithms. However, a key limitation of
variational inference is that the optimal approximation is typically not
tractable to compute; even in simple settings the problem is nonconvex. Thus,
recently developed statistical guarantees -- which all involve the (data)
asymptotic properties of the optimal variational distribution -- are not
reliably obtained in practice. In this work, we provide two major
contributions: a theoretical analysis of the asymptotic convexity properties of
variational inference in the popular setting with a Gaussian family; and
consistent stochastic variational inference (CSVI), an algorithm that exploits
these properties to find the optimal approximation in the asymptotic regime.
CSVI consists of a tractable initialization procedure that finds the local
basin of the optimal solution, and a scaled gradient descent algorithm that
stays locally confined to that basin. Experiments on nonconvex synthetic and
real-data examples show that compared with standard stochastic gradient
descent, CSVI improves the likelihood of obtaining the globally optimal
posterior approximation