19 research outputs found
Provable Bayesian Inference via Particle Mirror Descent
Bayesian methods are appealing in their flexibility in modeling complex data
and ability in capturing uncertainty in parameters. However, when Bayes' rule
does not result in tractable closed-form, most approximate inference algorithms
lack either scalability or rigorous guarantees. To tackle this challenge, we
propose a simple yet provable algorithm, \emph{Particle Mirror Descent} (PMD),
to iteratively approximate the posterior density. PMD is inspired by stochastic
functional mirror descent where one descends in the density space using a small
batch of data points at each iteration, and by particle filtering where one
uses samples to approximate a function. We prove result of the first kind that,
with particles, PMD provides a posterior density estimator that converges
in terms of -divergence to the true posterior in rate . We
demonstrate competitive empirical performances of PMD compared to several
approximate inference algorithms in mixture models, logistic regression, sparse
Gaussian processes and latent Dirichlet allocation on large scale datasets.Comment: 38 pages, 26 figure
Adaptive Variational Particle Filtering in Non-stationary Environments
Online convex optimization is a sequential prediction framework with the goal
to track and adapt to the environment through evaluating proper convex loss
functions. We study efficient particle filtering methods from the perspective
of such a framework.
We formulate an efficient particle filtering methods for the non-stationary
environment by making connections with the online mirror descent algorithm
which is known to be a universal online convex optimization algorithm.
As a result of this connection, our proposed particle filtering algorithm
proves to achieve optimal particle efficiency
Mirror Descent Search and its Acceleration
In recent years, attention has been focused on the relationship between
black-box optimiza- tion problem and reinforcement learning problem. In this
research, we propose the Mirror Descent Search (MDS) algorithm which is
applicable both for black box optimization prob- lems and reinforcement
learning problems. Our method is based on the mirror descent method, which is a
general optimization algorithm. The contribution of this research is roughly
twofold. We propose two essential algorithms, called MDS and Accelerated Mirror
Descent Search (AMDS), and two more approximate algorithms: Gaussian Mirror
Descent Search (G-MDS) and Gaussian Accelerated Mirror Descent Search (G-AMDS).
This re- search shows that the advanced methods developed in the context of the
mirror descent research can be applied to reinforcement learning problem. We
also clarify the relationship between an existing reinforcement learning
algorithm and our method. With two evaluation experiments, we show our proposed
algorithms converge faster than some state-of-the-art methods.Comment: Gold open access in Journal of Robotics and Autonomous Systems:
https://www.sciencedirect.com/science/article/pii/S092188901730754
Wasserstein variational gradient descent: From semi-discrete optimal transport to ensemble variational inference
Particle-based variational inference offers a flexible way of approximating
complex posterior distributions with a set of particles. In this paper we
introduce a new particle-based variational inference method based on the theory
of semi-discrete optimal transport. Instead of minimizing the KL divergence
between the posterior and the variational approximation, we minimize a
semi-discrete optimal transport divergence. The solution of the resulting
optimal transport problem provides both a particle approximation and a set of
optimal transportation densities that map each particle to a segment of the
posterior distribution. We approximate these transportation densities by
minimizing the KL divergence between a truncated distribution and the optimal
transport solution. The resulting algorithm can be interpreted as a form of
ensemble variational inference where each particle is associated with a local
variational approximation
Guaranteed inference in topic models
One of the core problems in statistical models is the estimation of a
posterior distribution. For topic models, the problem of posterior inference
for individual texts is particularly important, especially when dealing with
data streams, but is often intractable in the worst case. As a consequence,
existing methods for posterior inference are approximate and do not have any
guarantee on neither quality nor convergence rate. In this paper, we introduce
a provably fast algorithm, namely Online Maximum a Posteriori Estimation (OPE),
for posterior inference in topic models. OPE has more attractive properties
than existing inference approaches, including theoretical guarantees on quality
and fast rate of convergence to a local maximal/stationary point of the
inference problem. The discussions about OPE are very general and hence can be
easily employed in a wide range of contexts. Finally, we employ OPE to design
three methods for learning Latent Dirichlet Allocation from text streams or
large corpora. Extensive experiments demonstrate some superior behaviors of OPE
and of our new learning methods
Scalable Training of Inference Networks for Gaussian-Process Models
Inference in Gaussian process (GP) models is computationally challenging for
large data, and often difficult to approximate with a small number of inducing
points. We explore an alternative approximation that employs stochastic
inference networks for a flexible inference. Unfortunately, for such networks,
minibatch training is difficult to be able to learn meaningful correlations
over function outputs for a large dataset. We propose an algorithm that enables
such training by tracking a stochastic, functional mirror-descent algorithm. At
each iteration, this only requires considering a finite number of input
locations, resulting in a scalable and easy-to-implement algorithm. Empirical
results show comparable and, sometimes, superior performance to existing sparse
variational GP methods.Comment: ICML 2019. Update results added in the camera-ready versio
Kernel Implicit Variational Inference
Recent progress in variational inference has paid much attention to the
flexibility of variational posteriors. One promising direction is to use
implicit distributions, i.e., distributions without tractable densities as the
variational posterior. However, existing methods on implicit posteriors still
face challenges of noisy estimation and computational infeasibility when
applied to models with high-dimensional latent variables. In this paper, we
present a new approach named Kernel Implicit Variational Inference that
addresses these challenges. As far as we know, for the first time implicit
variational inference is successfully applied to Bayesian neural networks,
which shows promising results on both regression and classification tasks.Comment: Published as a conference paper at ICLR 201
A stochastic version of Stein Variational Gradient Descent for efficient sampling
We propose in this work RBM-SVGD, a stochastic version of Stein Variational
Gradient Descent (SVGD) method for efficiently sampling from a given
probability measure and thus useful for Bayesian inference. The method is to
apply the Random Batch Method (RBM) for interacting particle systems proposed
by Jin et al to the interacting particle systems in SVGD. While keeping the
behaviors of SVGD, it reduces the computational cost, especially when the
interacting kernel has long range. Numerical examples verify the efficiency of
this new version of SVGD
Variable Selection with Rigorous Uncertainty Quantification using Deep Bayesian Neural Networks: Posterior Concentration and Bernstein-von Mises Phenomenon
This work develops rigorous theoretical basis for the fact that deep Bayesian
neural network (BNN) is an effective tool for high-dimensional variable
selection with rigorous uncertainty quantification. We develop new Bayesian
non-parametric theorems to show that a properly configured deep BNN (1) learns
the variable importance effectively in high dimensions, and its learning rate
can sometimes "break" the curse of dimensionality. (2) BNN's uncertainty
quantification for variable importance is rigorous, in the sense that its 95%
credible intervals for variable importance indeed covers the truth 95% of the
time (i.e., the Bernstein-von Mises (BvM) phenomenon). The theoretical results
suggest a simple variable selection algorithm based on the BNN's credible
intervals. Extensive simulation confirms the theoretical findings and shows
that the proposed algorithm outperforms existing classic and
neural-network-based variable selection methods, particularly in high
dimensions
Exponential Family Estimation via Adversarial Dynamics Embedding
We present an efficient algorithm for maximum likelihood estimation (MLE) of
exponential family models, with a general parametrization of the energy
function that includes neural networks. We exploit the primal-dual view of the
MLE with a kinetics augmented model to obtain an estimate associated with an
adversarial dual sampler. To represent this sampler, we introduce a novel
neural architecture, dynamics embedding, that generalizes Hamiltonian
Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while
enabling tractable entropy estimation for the augmented model. By learning both
a dual sampler and the primal model simultaneously, and sharing parameters
between them, we obviate the requirement to design a separate sampling
procedure once the model has been trained, leading to more effective learning.
We show that many existing estimators, such as contrastive divergence,
pseudo/composite-likelihood, score matching, minimum Stein discrepancy
estimator, non-local contrastive objectives, noise-contrastive estimation, and
minimum probability flow, are special cases of the proposed approach, each
expressed by a different (fixed) dual sampler. An empirical investigation shows
that adapting the sampler during MLE can significantly improve on
state-of-the-art estimators.Comment: Appearing in NeurIPS 2019 Vancouver, Canada; a preliminary version
published in NeurIPS2018 Bayesian Deep Learning Worksho