158 research outputs found
Learning to Draw Samples with Amortized Stein Variational Gradient Descent
We propose a simple algorithm to train stochastic neural networks to draw
samples from given target distributions for probabilistic inference. Our method
is based on iteratively adjusting the neural network parameters so that the
output changes along a Stein variational gradient direction (Liu & Wang, 2016)
that maximally decreases the KL divergence with the target distribution. Our
method works for any target distribution specified by their unnormalized
density function, and can train any black-box architectures that are
differentiable in terms of the parameters we want to adapt. We demonstrate our
method with a number of applications, including variational autoencoder (VAE)
with expressive encoders to model complex latent space structures, and
hyper-parameter learning of MCMC samplers that allows Bayesian inference to
adaptively improve itself when seeing more data.Comment: Accepted by UAI 201
Reinforcement Learning with Deep Energy-Based Policies
We propose a method for learning expressive energy-based policies for
continuous states and actions, which has been feasible only in tabular domains
before. We apply our method to learning maximum entropy policies, resulting
into a new algorithm, called soft Q-learning, that expresses the optimal policy
via a Boltzmann distribution. We use the recently proposed amortized Stein
variational gradient descent to learn a stochastic sampling network that
approximates samples from this distribution. The benefits of the proposed
algorithm include improved exploration and compositionality that allows
transferring skills between tasks, which we confirm in simulated experiments
with swimming and walking robots. We also draw a connection to actor-critic
methods, which can be viewed performing approximate inference on the
corresponding energy-based model
Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE
We propose a number of new algorithms for learning deep energy models and
demonstrate their properties. We show that our SteinCD performs well in term of
test likelihood, while SteinGAN performs well in terms of generating realistic
looking images. Our results suggest promising directions for learning better
models by combining GAN-style methods with traditional energy-based learning
Advances in Variational Inference
Many modern unsupervised or semi-supervised machine learning algorithms rely
on Bayesian probabilistic models. These models are usually intractable and thus
require approximate inference. Variational inference (VI) lets us approximate a
high-dimensional Bayesian posterior with a simpler variational distribution by
solving an optimization problem. This approach has been successfully used in
various models and large-scale applications. In this review, we give an
overview of recent trends in variational inference. We first introduce standard
mean field variational inference, then review recent advances focusing on the
following aspects: (a) scalable VI, which includes stochastic approximations,
(b) generic VI, which extends the applicability of VI to a large class of
otherwise intractable models, such as non-conjugate models, (c) accurate VI,
which includes variational models beyond the mean field approximation or with
atypical divergences, and (d) amortized VI, which implements the inference over
local latent variables with inference networks. Finally, we provide a summary
of promising future research directions
Continuous-Time Flows for Efficient Inference and Density Estimation
Two fundamental problems in unsupervised learning are efficient inference for
latent-variable models and robust density estimation based on large amounts of
unlabeled data. Algorithms for the two tasks, such as normalizing flows and
generative adversarial networks (GANs), are often developed independently. In
this paper, we propose the concept of {\em continuous-time flows} (CTFs), a
family of diffusion-based methods that are able to asymptotically approach a
target distribution. Distinct from normalizing flows and GANs, CTFs can be
adopted to achieve the above two goals in one framework, with theoretical
guarantees. Our framework includes distilling knowledge from a CTF for
efficient inference, and learning an explicit energy-based distribution with
CTFs for density estimation. Both tasks rely on a new technique for
distribution matching within amortized learning. Experiments on various tasks
demonstrate promising performance of the proposed CTF framework, compared to
related techniques.Comment: ICML 2018 (fixed a reference
Generative Particle Variational Inference via Estimation of Functional Gradients
Recently, particle-based variational inference (ParVI) methods have gained
interest because they directly minimize the Kullback-Leibler divergence and do
not suffer from approximation errors from the evidence-based lower bound.
However, many ParVI approaches do not allow arbitrary sampling from the
posterior, and the few that do allow such sampling suffer from suboptimality.
This work proposes a new method for learning to approximately sample from the
posterior distribution. We construct a neural sampler that is trained with the
functional gradient of the KL-divergence between the empirical sampling
distribution and the target distribution, assuming the gradient resides within
a reproducing kernel Hilbert space. Our generative ParVI (GPVI) approach
maintains the asymptotic performance of ParVI methods while offering the
flexibility of a generative sampler. Through carefully constructed experiments,
we show that GPVI outperforms previous generative ParVI methods such as
amortized SVGD, and is competitive with ParVI as well as gold-standard
approaches like Hamiltonian Monte Carlo for fitting both exactly known and
intractable target distributions.Comment: 10 pages, 3 figures, 4 tables, 1 algorith
Semi-Amortized Variational Autoencoders
Amortized variational inference (AVI) replaces instance-specific local
inference with a global inference network. While AVI has enabled efficient
training of deep generative models such as variational autoencoders (VAE),
recent empirical work suggests that inference networks can produce suboptimal
variational parameters. We propose a hybrid approach, to use AVI to initialize
the variational parameters and run stochastic variational inference (SVI) to
refine them. Crucially, the local SVI procedure is itself differentiable, so
the inference network and generative model can be trained end-to-end with
gradient-based optimization. This semi-amortized approach enables the use of
rich generative models without experiencing the posterior-collapse phenomenon
common in training VAEs for problems like text generation. Experiments show
this approach outperforms strong autoregressive and variational baselines on
standard text and image datasets.Comment: ICML 201
Stein Variational Gradient Descent as Moment Matching
Stein variational gradient descent (SVGD) is a non-parametric inference
algorithm that evolves a set of particles to fit a given distribution of
interest. We analyze the non-asymptotic properties of SVGD, showing that there
exists a set of functions, which we call the Stein matching set, whose
expectations are exactly estimated by any set of particles that satisfies the
fixed point equation of SVGD. This set is the image of Stein operator applied
on the feature maps of the positive definite kernel used in SVGD. Our results
provide a theoretical framework for analyzing the properties of SVGD with
different kernels, shedding insight into optimal kernel choice. In particular,
we show that SVGD with linear kernels yields exact estimation of means and
variances on Gaussian distributions, while random Fourier features enable
probabilistic bounds for distributional approximation. Our results offer a
refreshing view of the classical inference problem as fitting Stein's identity
or solving the Stein equation, which may motivate more efficient algorithms.Comment: Conference on Neural Information Processing Systems (NIPS) 201
Adversarial Learning of a Sampler Based on an Unnormalized Distribution
We investigate adversarial learning in the case when only an unnormalized
form of the density can be accessed, rather than samples. With insights so
garnered, adversarial learning is extended to the case for which one has access
to an unnormalized form u(x) of the target density function, but no samples.
Further, new concepts in GAN regularization are developed, based on learning
from samples or from u(x). The proposed method is compared to alternative
approaches, with encouraging results demonstrated across a range of
applications, including deep soft Q-learning.Comment: Published in AISTATS 2019; Code: https://github.com/ChunyuanLI/RA
Self-Adversarially Learned Bayesian Sampling
Scalable Bayesian sampling is playing an important role in modern machine
learning, especially in the fast-developed unsupervised-(deep)-learning models.
While tremendous progresses have been achieved via scalable Bayesian sampling
such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient
descent (SVGD), the generated samples are typically highly correlated.
Moreover, their sample-generation processes are often criticized to be
inefficient. In this paper, we propose a novel self-adversarial learning
framework that automatically learns a conditional generator to mimic the
behavior of a Markov kernel (transition kernel). High-quality samples can be
efficiently generated by direct forward passes though a learned generator. Most
importantly, the learning process adopts a self-learning paradigm, requiring no
information on existing Markov kernels, e.g., knowledge of how to draw samples
from them. Specifically, our framework learns to use current samples, either
from the generator or pre-provided training data, to update the generator such
that the generated samples progressively approach a target distribution, thus
it is called self-learning. Experiments on both synthetic and real datasets
verify advantages of our framework, outperforming related methods in terms of
both sampling efficiency and sample quality.Comment: AAAI 201
- …