17 research outputs found
Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets
Hierarchical Bayesian networks and neural networks with stochastic hidden
units are commonly perceived as two separate types of models. We show that
either of these types of models can often be transformed into an instance of
the other, by switching between centered and differentiable non-centered
parameterizations of the latent variables. The choice of parameterization
greatly influences the efficiency of gradient-based posterior inference; we
show that they are often complementary to eachother, we clarify when each
parameterization is preferred and show how inference can be made robust. In the
non-centered form, a simple Monte Carlo estimator of the marginal likelihood
can be used for learning the parameters. Theoretical results are supported by
experiments
Auto-Encoding Variational Bayes
How can we perform efficient inference and learning in directed probabilistic
models, in the presence of continuous latent variables with intractable
posterior distributions, and large datasets? We introduce a stochastic
variational inference and learning algorithm that scales to large datasets and,
under some mild differentiability conditions, even works in the intractable
case. Our contributions is two-fold. First, we show that a reparameterization
of the variational lower bound yields a lower bound estimator that can be
straightforwardly optimized using standard stochastic gradient methods. Second,
we show that for i.i.d. datasets with continuous latent variables per
datapoint, posterior inference can be made especially efficient by fitting an
approximate inference model (also called a recognition model) to the
intractable posterior using the proposed lower bound estimator. Theoretical
advantages are reflected in experimental results
Nested Variational Compression in Deep Gaussian Processes
Deep Gaussian processes provide a flexible approach to probabilistic modelling of data using either supervised or unsupervised learning. For tractable inference approximations to the marginal likelihood of the model must be made. The original approach to approximate inference in these models used variational compression to allow for approximate variational marginalization of the hidden variables leading to a lower bound on the marginal likelihood of the model [Damianou and Lawrence, 2013]. In this paper we extend this idea with a nested variational compression. The resulting lower bound on the likelihood can be easily parallelized or adapted for stochastic variational inference