453 research outputs found
Stick-Breaking Variational Autoencoders
We extend Stochastic Gradient Variational Bayes to perform posterior
inference for the weights of Stick-Breaking processes. This development allows
us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian
nonparametric version of the variational autoencoder that has a latent
representation with stochastic dimensionality. We experimentally demonstrate
that the SB-VAE, and a semi-supervised variant, learn highly discriminative
latent representations that often outperform the Gaussian VAE's.Comment: ICLR 2017, Conference Trac
Learning to Draw Samples with Amortized Stein Variational Gradient Descent
We propose a simple algorithm to train stochastic neural networks to draw
samples from given target distributions for probabilistic inference. Our method
is based on iteratively adjusting the neural network parameters so that the
output changes along a Stein variational gradient direction (Liu & Wang, 2016)
that maximally decreases the KL divergence with the target distribution. Our
method works for any target distribution specified by their unnormalized
density function, and can train any black-box architectures that are
differentiable in terms of the parameters we want to adapt. We demonstrate our
method with a number of applications, including variational autoencoder (VAE)
with expressive encoders to model complex latent space structures, and
hyper-parameter learning of MCMC samplers that allows Bayesian inference to
adaptively improve itself when seeing more data.Comment: Accepted by UAI 201
Exemplar-based synthesis of geology using kernel discrepancies and generative neural networks
We propose a framework for synthesis of geological images based on an
exemplar image. We synthesize new realizations such that the discrepancy in the
patch distribution between the realizations and the exemplar image is
minimized. Such discrepancy is quantified using a kernel method for two-sample
test called maximum mean discrepancy. To enable fast synthesis, we train a
generative neural network in an offline phase to sample realizations
efficiently during deployment, while also providing a parametrization of the
synthesis process. We assess the framework on a classical binary image
representing channelized subsurface reservoirs, finding that the method
reproduces the visual patterns and spatial statistics (image histogram and
two-point probability functions) of the exemplar image
Geodesic Clustering in Deep Generative Models
Deep generative models are tremendously successful in learning
low-dimensional latent representations that well-describe the data. These
representations, however, tend to much distort relationships between points,
i.e. pairwise distances tend to not reflect semantic similarities well. This
renders unsupervised tasks, such as clustering, difficult when working with the
latent representations. We demonstrate that taking the geometry of the
generative model into account is sufficient to make simple clustering
algorithms work well over latent representations. Leaning on the recent finding
that deep generative models constitute stochastically immersed Riemannian
manifolds, we propose an efficient algorithm for computing geodesics (shortest
paths) and computing distances in the latent space, while taking its distortion
into account. We further propose a new architecture for modeling uncertainty in
variational autoencoders, which is essential for understanding the geometry of
deep generative models. Experiments show that the geodesic distance is very
likely to reflect the internal structure of the data
A Tutorial on Deep Latent Variable Models of Natural Language
There has been much recent, exciting work on combining the complementary
strengths of latent variable models and deep learning. Latent variable modeling
makes it easy to explicitly specify model constraints through conditional
independence properties, while deep learning makes it possible to parameterize
these conditional likelihoods with powerful function approximators. While these
"deep latent variable" models provide a rich, flexible framework for modeling
many real-world phenomena, difficulties exist: deep parameterizations of
conditional likelihoods usually make posterior inference intractable, and
latent variable objectives often complicate backpropagation by introducing
points of non-differentiability. This tutorial explores these issues in depth
through the lens of variational inference.Comment: EMNLP 2018 Tutoria
Stein Variational Adaptive Importance Sampling
We propose a novel adaptive importance sampling algorithm which incorporates
Stein variational gradient decent algorithm (SVGD) with importance sampling
(IS). Our algorithm leverages the nonparametric transforms in SVGD to
iteratively decrease the KL divergence between our importance proposal and the
target distribution. The advantages of this algorithm are twofold: first, our
algorithm turns SVGD into a standard IS algorithm, allowing us to use standard
diagnostic and analytic tools of IS to evaluate and interpret the results;
second, we do not restrict the choice of our importance proposal to predefined
distribution families like traditional (adaptive) IS methods. Empirical
experiments demonstrate that our algorithm performs well on evaluating
partition functions of restricted Boltzmann machines and testing likelihood of
variational auto-encoders
A Selective Overview of Deep Learning
Deep learning has arguably achieved tremendous success in recent years. In
simple words, deep learning uses the composition of many nonlinear functions to
model the complex dependency between input features and labels. While neural
networks have a long history, recent advances have greatly improved their
performance in computer vision, natural language processing, etc. From the
statistical and scientific perspective, it is natural to ask: What is deep
learning? What are the new characteristics of deep learning, compared with
classical methods? What are the theoretical foundations of deep learning? To
answer these questions, we introduce common neural network models (e.g.,
convolutional neural nets, recurrent neural nets, generative adversarial nets)
and training techniques (e.g., stochastic gradient descent, dropout, batch
normalization) from a statistical point of view. Along the way, we highlight
new characteristics of deep learning (including depth and over-parametrization)
and explain their practical and theoretical benefits. We also sample recent
results on theories of deep learning, many of which are only suggestive. While
a complete understanding of deep learning remains elusive, we hope that our
perspectives and discussions serve as a stimulus for new statistical research
Unsupervised speech representation learning using WaveNet autoencoders
We consider the task of unsupervised extraction of meaningful latent
representations of speech by applying autoencoding neural networks to speech
waveforms. The goal is to learn a representation able to capture high level
semantic content from the signal, e.g.\ phoneme identities, while being
invariant to confounding low level details in the signal such as the underlying
pitch contour or background noise. Since the learned representation is tuned to
contain only phonetic content, we resort to using a high capacity WaveNet
decoder to infer information discarded by the encoder from previous samples.
Moreover, the behavior of autoencoder models depends on the kind of constraint
that is applied to the latent representation. We compare three variants: a
simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder
(VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of
learned representations in terms of speaker independence, the ability to
predict phonetic content, and the ability to accurately reconstruct individual
spectrogram frames. Moreover, for discrete encodings extracted using the
VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a
regularization scheme that forces the representations to focus on the phonetic
content of the utterance and report performance comparable with the top entries
in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.Comment: Accepted to IEEE TASLP, final version available at
http://dx.doi.org/10.1109/TASLP.2019.293886
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling
Probability density function estimation with weighted samples is the main
foundation of all adaptive importance sampling algorithms. Classically, a
target distribution is approximated either by a non-parametric model or within
a parametric family. However, these models suffer from the curse of
dimensionality or from their lack of flexibility. In this contribution, we
suggest to use as the approximating model a distribution parameterised by a
variational autoencoder. We extend the existing framework to the case of
weighted samples by introducing a new objective function. The flexibility of
the obtained family of distributions makes it as expressive as a non-parametric
model, and despite the very high number of parameters to estimate, this family
is much more efficient in high dimension than the classical Gaussian or
Gaussian mixture families. Moreover, in order to add flexibility to the model
and to be able to learn multimodal distributions, we consider a learnable prior
distribution for the variational autoencoder latent variables. We also
introduce a new pre-training procedure for the variational autoencoder to find
good starting weights of the neural networks to prevent as much as possible the
posterior collapse phenomenon to happen. At last, we explicit how the resulting
distribution can be combined with importance sampling, and we exploit the
proposed procedure in existing adaptive importance sampling algorithms to draw
points from a target distribution and to estimate a rare event probability in
high dimension on two multimodal problems.Comment: 20 pages, 5 figure
A Stein variational Newton method
Stein variational gradient descent (SVGD) was recently proposed as a general
purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]:
it minimizes the Kullback-Leibler divergence between the target distribution
and its approximation by implementing a form of functional gradient descent on
a reproducing kernel Hilbert space. In this paper, we accelerate and generalize
the SVGD algorithm by including second-order information, thereby approximating
a Newton-like iteration in function space. We also show how second-order
information can lead to more effective choices of kernel. We observe
significant computational gains over the original SVGD algorithm in multiple
test cases.Comment: 18 pages, 7 figure
- …