4,903 research outputs found
Hierarchical Importance Weighted Autoencoders
Importance weighted variational inference (Burda et al., 2015) uses multiple
i.i.d. samples to have a tighter variational lower bound. We believe a joint
proposal has the potential of reducing the number of redundant samples, and
introduce a hierarchical structure to induce correlation. The hope is that the
proposals would coordinate to make up for the error made by one another to
reduce the variance of the importance estimator. Theoretically, we analyze the
condition under which convergence of the estimator variance can be connected to
convergence of the lower bound. Empirically, we confirm that maximization of
the lower bound does implicitly minimize variance. Further analysis shows that
this is a result of negative correlation induced by the proposed hierarchical
meta sampling scheme, and performance of inference also improves when the
number of samples increases.Comment: Accepted by ICML 2019. 17 page
Learning Hierarchical Priors in VAEs
We propose to learn a hierarchical prior in the context of variational
autoencoders to avoid the over-regularisation resulting from a standard normal
prior distribution. To incentivise an informative latent representation of the
data, we formulate the learning problem as a constrained optimisation problem
by extending the Taming VAEs framework to two-level hierarchical models. We
introduce a graph-based interpolation method, which shows that the topology of
the learned latent representation corresponds to the topology of the data
manifold---and present several examples, where desired properties of latent
representation such as smoothness and simple explanatory factors are learned by
the prior.Comment: Published at NeurIPS 2019 (spotlight
Variational Composite Autoencoders
Learning in the latent variable model is challenging in the presence of the
complex data structure or the intractable latent variable. Previous variational
autoencoders can be low effective due to the straightforward encoder-decoder
structure. In this paper, we propose a variational composite autoencoder to
sidestep this issue by amortizing on top of the hierarchical latent variable
model. The experimental results confirm the advantages of our model
Asymmetric Variational Autoencoders
Variational inference for latent variable models is prevalent in various
machine learning problems, typically solved by maximizing the Evidence Lower
Bound (ELBO) of the true data likelihood with respect to a variational
distribution. However, freely enriching the family of variational distribution
is challenging since the ELBO requires variational likelihood evaluations of
the latent variables. In this paper, we propose a novel framework to enrich the
variational family by incorporating auxiliary variables to the variational
family. The resulting inference network doesn't require density evaluations for
the auxiliary variables and thus complex implicit densities over the auxiliary
variables can be constructed by neural networks. It can be shown that the
actual variational posterior of the proposed approach is essentially modeling a
rich probabilistic mixture of simple variational posterior indexed by auxiliary
variables, thus a flexible inference model can be built. Empirical evaluations
on several density estimation tasks demonstrates the effectiveness of the
proposed method.Comment: ICML 2018 Workshop on Theoretical Foundations and Applications of
Deep Generative Model
Undirected Graphical Models as Approximate Posteriors
The representation of the approximate posterior is a critical aspect of
effective variational autoencoders (VAEs). Poor choices for the approximate
posterior have a detrimental impact on the generative performance of VAEs due
to the mismatch with the true posterior. We extend the class of posterior
models that may be learned by using undirected graphical models. We develop an
efficient method to train undirected approximate posteriors by showing that the
gradient of the training objective with respect to the parameters of the
undirected posterior can be computed by backpropagation through Markov chain
Monte Carlo updates. We apply these gradient estimators for training discrete
VAEs with Boltzmann machines as approximate posteriors and demonstrate that
undirected models outperform previous results obtained using directed graphical
models. Our implementation is available at https://github.com/QuadrantAI/dvaess .Comment: Accepted to ICML 202
Advances in Variational Inference
Many modern unsupervised or semi-supervised machine learning algorithms rely
on Bayesian probabilistic models. These models are usually intractable and thus
require approximate inference. Variational inference (VI) lets us approximate a
high-dimensional Bayesian posterior with a simpler variational distribution by
solving an optimization problem. This approach has been successfully used in
various models and large-scale applications. In this review, we give an
overview of recent trends in variational inference. We first introduce standard
mean field variational inference, then review recent advances focusing on the
following aspects: (a) scalable VI, which includes stochastic approximations,
(b) generic VI, which extends the applicability of VI to a large class of
otherwise intractable models, such as non-conjugate models, (c) accurate VI,
which includes variational models beyond the mean field approximation or with
atypical divergences, and (d) amortized VI, which implements the inference over
local latent variables with inference networks. Finally, we provide a summary
of promising future research directions
Doubly Semi-Implicit Variational Inference
We extend the existing framework of semi-implicit variational inference
(SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way
to perform variational inference and learning when both the approximate
posterior and the prior distribution are semi-implicit. In other words, DSIVI
performs inference in models where the prior and the posterior can be expressed
as an intractable infinite mixture of some analytic density with a highly
flexible implicit mixing distribution. We provide a sandwich bound on the
evidence lower bound (ELBO) objective that can be made arbitrarily tight.
Unlike discriminator-based and kernel-based approaches to implicit variational
inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically
exact. We evaluate DSIVI on a set of problems that benefit from implicit
priors. In particular, we show that DSIVI gives rise to a simple modification
of VampPrior, the current state-of-the-art prior for variational autoencoders,
which improves its performance
Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering
We investigate a variant of variational autoencoders where there is a
superstructure of discrete latent variables on top of the latent features. In
general, our superstructure is a tree structure of multiple super latent
variables and it is automatically learned from data. When there is only one
latent variable in the superstructure, our model reduces to one that assumes
the latent features to be generated from a Gaussian mixture model. We call our
model the latent tree variational autoencoder (LTVAE). Whereas previous deep
learning methods for clustering produce only one partition of data, LTVAE
produces multiple partitions of data, each being given by one super latent
variable. This is desirable because high dimensional data usually have many
different natural facets and can be meaningfully partitioned in multiple ways.Comment: Published in ICLR 201
Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification
Unsupervised learning and supervised learning are key research topics in deep
learning. However, as high-capacity supervised neural networks trained with a
large amount of labels have achieved remarkable success in many computer vision
tasks, the availability of large-scale labeled images reduced the significance
of unsupervised learning. Inspired by the recent trend toward revisiting the
importance of unsupervised learning, we investigate joint supervised and
unsupervised learning in a large-scale setting by augmenting existing neural
networks with decoding pathways for reconstruction. First, we demonstrate that
the intermediate activations of pretrained large-scale classification networks
preserve almost all the information of input images except a portion of local
spatial details. Then, by end-to-end training of the entire augmented
architecture with the reconstructive objective, we show improvement of the
network performance for supervised tasks. We evaluate several variants of
autoencoders, including the recently proposed "what-where" autoencoder that
uses the encoder pooling switches, to study the importance of the architecture
design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012
protocol as a strong baseline for image classification, our methods improve the
validation-set accuracy by a noticeable margin.Comment: International Conference on Machine Learning (ICML), 201
Importance Weighted Hierarchical Variational Inference
Variational Inference is a powerful tool in the Bayesian modeling toolkit,
however, its effectiveness is determined by the expressivity of the utilized
variational distributions in terms of their ability to match the true posterior
distribution. In turn, the expressivity of the variational family is largely
limited by the requirement of having a tractable density function. To overcome
this roadblock, we introduce a new family of variational upper bounds on a
marginal log density in the case of hierarchical models (also known as latent
variable models). We then give an upper bound on the Kullback-Leibler
divergence and derive a family of increasingly tighter variational lower bounds
on the otherwise intractable standard evidence lower bound for hierarchical
variational distributions, enabling the use of more expressive approximate
posteriors. We show that previously known methods, such as Hierarchical
Variational Models, Semi-Implicit Variational Inference and Doubly
Semi-Implicit Variational Inference can be seen as special cases of the
proposed approach, and empirically demonstrate superior performance of the
proposed method in a set of experiments
- …