63 research outputs found
Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow
Stochastic control-flow models (SCFMs) are a class of generative models that
involve branching on choices from discrete random variables. Amortized
gradient-based learning of SCFMs is challenging as most approaches targeting
discrete variables rely on their continuous relaxations---which can be
intractable in SCFMs, as branching on relaxations requires evaluating all
(exponentially many) branching paths. Tractable alternatives mainly combine
REINFORCE with complex control-variate schemes to improve the variance of naive
estimators. Here, we revisit the reweighted wake-sleep (RWS) (Bornschein and
Bengio, 2015) algorithm, and through extensive evaluations, show that it
outperforms current state-of-the-art methods in learning SCFMs. Further, in
contrast to the importance weighted autoencoder, we observe that RWS learns
better models and inference networks with increasing numbers of particles. Our
results suggest that RWS is a competitive, often preferable, alternative for
learning SCFMs.Comment: Tuan Anh Le and Adam R. Kosiorek contributed equally; accepted to
Uncertainty in Artificial Intelligence 201
Natural Evolution Strategies as a Black Box Estimator for Stochastic Variational Inference
Stochastic variational inference and its derivatives in the form of
variational autoencoders enjoy the ability to perform Bayesian inference on
large datasets in an efficient manner. However, performing inference with a VAE
requires a certain design choice (i.e. reparameterization trick) to allow
unbiased and low variance gradient estimation, restricting the types of models
that can be created. To overcome this challenge, an alternative estimator based
on natural evolution strategies is proposed. This estimator does not make
assumptions about the kind of distributions used, allowing for the creation of
models that would otherwise not have been possible under the VAE framework
Tensor Monte Carlo: particle methods for the GPU era
Multi-sample, importance-weighted variational autoencoders (IWAE) give
tighter bounds and more accurate uncertainty estimates than variational
autoencoders (VAE) trained with a standard single-sample objective. However,
IWAEs scale poorly: as the latent dimensionality grows, they require
exponentially many samples to retain the benefits of importance weighting.
While sequential Monte-Carlo (SMC) can address this problem, it is
prohibitively slow because the resampling step imposes sequential structure
which cannot be parallelised, and moreover, resampling is non-differentiable
which is problematic when learning approximate posteriors. To address these
issues, we developed tensor Monte-Carlo (TMC) which gives exponentially many
importance samples by separately drawing samples for each of the latent
variables, then averaging over all possible combinations. While the sum
over exponentially many terms might seem to be intractable, in many cases it
can be computed efficiently as a series of tensor inner-products. We show that
TMC is superior to IWAE on a generative model with multiple stochastic layers
trained on the MNIST handwritten digit database, and we show that TMC can be
combined with standard variance reduction techniques
Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds
This paper introduces novel results for the score function gradient estimator
of the importance weighted variational bound (IWAE). We prove that in the limit
of large (number of importance samples) one can choose the control variate
such that the Signal-to-Noise ratio (SNR) of the estimator grows as .
This is in contrast to the standard pathwise gradient estimator where the SNR
decreases as . Based on our theoretical findings we develop a novel
control variate that extends on VIMCO. Empirically, for the training of both
continuous and discrete generative models, the proposed method yields superior
variance reduction, resulting in an SNR for IWAE that increases with
without relying on the reparameterization trick. The novel estimator is
competitive with state-of-the-art reparameterization-free gradient estimators
such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective
(TVO) when training generative models
- …