4,463 research outputs found
Stein Variational Message Passing for Continuous Graphical Models
We propose a novel distributed inference algorithm for continuous graphical
models, by extending Stein variational gradient descent (SVGD) to leverage the
Markov dependency structure of the distribution of interest. Our approach
combines SVGD with a set of structured local kernel functions defined on the
Markov blanket of each node, which alleviates the curse of high dimensionality
and simultaneously yields a distributed algorithm for decentralized inference
tasks. We justify our method with theoretical analysis and show that the use of
local kernels can be viewed as a new type of localized approximation that
matches the target distribution on the conditional distributions of each node
over its Markov blanket. Our empirical results show that our method outperforms
a variety of baselines including standard MCMC and particle message passing
methods
Moment-Based Variational Inference for Markov Jump Processes
We propose moment-based variational inference as a flexible framework for
approximate smoothing of latent Markov jump processes. The main ingredient of
our approach is to partition the set of all transitions of the latent process
into classes. This allows to express the Kullback-Leibler divergence between
the approximate and the exact posterior process in terms of a set of moment
functions that arise naturally from the chosen partition. To illustrate
possible choices of the partition, we consider special classes of jump
processes that frequently occur in applications. We then extend the results to
parameter inference and demonstrate the method on several examples.Comment: Accepted by the 36th International Conference on Machine Learning
(ICML 2019
Variational Particle Approximations
Approximate inference in high-dimensional, discrete probabilistic models is a
central problem in computational statistics and machine learning. This paper
describes discrete particle variational inference (DPVI), a new approach that
combines key strengths of Monte Carlo, variational and search-based techniques.
DPVI is based on a novel family of particle-based variational approximations
that can be fit using simple, fast, deterministic search techniques. Like Monte
Carlo, DPVI can handle multiple modes, and yields exact results in a
well-defined limit. Like unstructured mean-field, DPVI is based on optimizing a
lower bound on the partition function; when this quantity is not of intrinsic
interest, it facilitates convergence assessment and debugging. Like both Monte
Carlo and combinatorial search, DPVI can take advantage of factorization,
sequential structure, and custom search operators. This paper defines DPVI
particle-based approximation family and partition function lower bounds, along
with the sequential DPVI and local DPVI algorithm templates for optimizing
them. DPVI is illustrated and evaluated via experiments on lattice Markov
Random Fields, nonparametric Bayesian mixtures and block-models, and parametric
as well as non-parametric hidden Markov models. Results include applications to
real-world spike-sorting and relational modeling problems, and show that DPVI
can offer appealing time/accuracy trade-offs as compared to multiple
alternatives.Comment: First two authors contributed equally to this wor
Deep Variational Reinforcement Learning for POMDPs
Many real-world sequential decision making problems are partially observable
by nature, and the environment model is typically unknown. Consequently, there
is great need for reinforcement learning methods that can tackle such problems
given only a stream of incomplete and noisy observations. In this paper, we
propose deep variational reinforcement learning (DVRL), which introduces an
inductive bias that allows an agent to learn a generative model of the
environment and perform inference in that model to effectively aggregate the
available information. We develop an n-step approximation to the evidence lower
bound (ELBO), allowing the model to be trained jointly with the policy. This
ensures that the latent state representation is suitable for the control task.
In experiments on Mountain Hike and flickering Atari we show that our method
outperforms previous approaches relying on recurrent neural networks to encode
the past
Adaptive and Calibrated Ensemble Learning with Dependent Tail-free Process
Ensemble learning is a mainstay in modern data science practice. Conventional
ensemble algorithms assigns to base models a set of deterministic, constant
model weights that (1) do not fully account for variations in base model
accuracy across subgroups, nor (2) provide uncertainty estimates for the
ensemble prediction, which could result in mis-calibrated (i.e. precise but
biased) predictions that could in turn negatively impact the algorithm
performance in real-word applications. In this work, we present an adaptive,
probabilistic approach to ensemble learning using dependent tail-free process
as ensemble weight prior. Given input feature , our method
optimally combines base models based on their predictive accuracy in the
feature space , and provides interpretable
uncertainty estimates both in model selection and in ensemble prediction. To
encourage scalable and calibrated inference, we derive a structured variational
inference algorithm that jointly minimize KL objective and the model's
calibration score (i.e. Continuous Ranked Probability Score (CRPS)). We
illustrate the utility of our method on both a synthetic nonlinear function
regression task, and on the real-world application of spatio-temporal
integration of particle pollution prediction models in New England.Comment: Work-in-progress manuscript appeared at Bayesian Nonparametrics
Workshop, Neural Information Processing Systems 201
Stein Variational Policy Gradient
Policy gradient methods have been successfully applied to many complex
reinforcement learning problems. However, policy gradient methods suffer from
high variance, slow convergence, and inefficient exploration. In this work, we
introduce a maximum entropy policy optimization framework which explicitly
encourages parameter exploration, and show that this framework can be reduced
to a Bayesian inference problem. We then propose a novel Stein variational
policy gradient method (SVPG) which combines existing policy gradient methods
and a repulsive functional to generate a set of diverse but well-behaved
policies. SVPG is robust to initialization and can easily be implemented in a
parallel manner. On continuous control problems, we find that implementing SVPG
on top of REINFORCE and advantage actor-critic algorithms improves both average
return and data efficiency
Approximate inference with Wasserstein gradient flows
We present a novel approximate inference method for diffusion processes,
based on the Wasserstein gradient flow formulation of the diffusion. In this
formulation, the time-dependent density of the diffusion is derived as the
limit of implicit Euler steps that follow the gradients of a particular free
energy functional. Existing methods for computing Wasserstein gradient flows
rely on discretization of the domain of the diffusion, prohibiting their
application to domains in more than several dimensions. We propose instead a
discretization-free inference method that computes the Wasserstein gradient
flow directly in a space of continuous functions. We characterize approximation
properties of the proposed method and evaluate it on a nonlinear filtering
task, finding performance comparable to the state-of-the-art for filtering
diffusions
Stein Variational Adaptive Importance Sampling
We propose a novel adaptive importance sampling algorithm which incorporates
Stein variational gradient decent algorithm (SVGD) with importance sampling
(IS). Our algorithm leverages the nonparametric transforms in SVGD to
iteratively decrease the KL divergence between our importance proposal and the
target distribution. The advantages of this algorithm are twofold: first, our
algorithm turns SVGD into a standard IS algorithm, allowing us to use standard
diagnostic and analytic tools of IS to evaluate and interpret the results;
second, we do not restrict the choice of our importance proposal to predefined
distribution families like traditional (adaptive) IS methods. Empirical
experiments demonstrate that our algorithm performs well on evaluating
partition functions of restricted Boltzmann machines and testing likelihood of
variational auto-encoders
Efficient transfer learning and online adaptation with latent variable models for continuous control
Traditional model-based RL relies on hand-specified or learned models of
transition dynamics of the environment. These methods are sample efficient and
facilitate learning in the real world but fail to generalize to subtle
variations in the underlying dynamics, e.g., due to differences in mass,
friction, or actuators across robotic agents or across time. We propose using
variational inference to learn an explicit latent representation of unknown
environment properties that accelerates learning and facilitates generalization
on novel environments at test time. We use Online Bayesian Inference of these
learned latents to rapidly adapt online to changes in environments without
retaining large replay buffers of recent data. Combined with a neural network
ensemble that models dynamics and captures uncertainty over dynamics, our
approach demonstrates positive transfer during training and online adaptation
on the continuous control task HalfCheetah.Comment: Presented at Continual Learning Workshop, NeurIPS 2018, Montreal,
Canada. 5 pages, 4 figure
Learning to Draw Samples with Amortized Stein Variational Gradient Descent
We propose a simple algorithm to train stochastic neural networks to draw
samples from given target distributions for probabilistic inference. Our method
is based on iteratively adjusting the neural network parameters so that the
output changes along a Stein variational gradient direction (Liu & Wang, 2016)
that maximally decreases the KL divergence with the target distribution. Our
method works for any target distribution specified by their unnormalized
density function, and can train any black-box architectures that are
differentiable in terms of the parameters we want to adapt. We demonstrate our
method with a number of applications, including variational autoencoder (VAE)
with expressive encoders to model complex latent space structures, and
hyper-parameter learning of MCMC samplers that allows Bayesian inference to
adaptively improve itself when seeing more data.Comment: Accepted by UAI 201
- …