21 research outputs found
Accelerated Information Gradient flow
We present a framework for Nesterov's accelerated gradient flows in
probability space. Here four examples of information metrics are considered,
including Fisher-Rao metric, Wasserstein-2 metric, Kalman-Wasserstein metric
and Stein metric. For both Fisher-Rao and Wasserstein-2 metrics, we prove
convergence properties of accelerated gradient flows. In implementations, we
propose a sampling-efficient discrete-time algorithm for Wasserstein-2,
Kalman-Wasserstein and Stein accelerated gradient flows with a restart
technique. We also formulate a kernel bandwidth selection method, which learns
the gradient of logarithm of density from Brownian-motion samples. Numerical
experiments, including Bayesian logistic regression and Bayesian neural
network, show the strength of the proposed methods compared with
state-of-the-art algorithms.Comment: 33 page
De-randomizing MCMC dynamics with the diffusion Stein operator
Publisher Copyright: Β© 2021 Neural information processing systems foundation. All rights reserved.Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein metric with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretize into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complicated MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.Peer reviewe
Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions
Integration against, and hence sampling from, high-dimensional probability
distributions is of essential importance in many application areas and has been
an active research area for decades. One approach that has drawn increasing
attention in recent years has been the generation of samples from a target
distribution using transport maps: if
is the pushforward
of an easily-sampled probability distribution under
the transport map , then the application of to
-distributed samples yields
-distributed samples. This paper proposes the
application of transport maps not just to random samples, but also to
quasi-Monte Carlo points, higher-order nets, and sparse grids in order for the
transformed samples to inherit the original convergence rates that are often
better than , being the number of samples/quadrature nodes. Our
main result is the derivation of an explicit transport map for the case that
is a mixture of simple distributions, e.g.\ a
Gaussian mixture, in which case application of the transport map requires
the solution of an \emph{explicit} ODE with \emph{closed-form} right-hand side.
Mixture distributions are of particular applicability and interest since many
methods proceed by first approximating by a mixture
and then sampling from that mixture (often using importance reweighting).
Hence, this paper allows for the sampling step to provide a better convergence
rate than for all such methods.Comment: 24 page
Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps
We consider problems of minimizing functionals of probability
measures on the Euclidean space. To propose an accelerated gradient descent
algorithm for such problems, we consider gradient flow of transport maps that
give push-forward measures of an initial measure. Then we propose a
deterministic accelerated algorithm by extending Nesterov's acceleration
technique with momentum. This algorithm do not based on the Wasserstein
geometry. Furthermore, to estimate the convergence rate of the accelerated
algorithm, we introduce new convexity and smoothness for based on
transport maps. As a result, we can show that the accelerated algorithm
converges faster than a normal gradient descent algorithm. Numerical
experiments support this theoretical result.Comment: 31 page
Scaling Limits of the Wasserstein information matrix on Gaussian Mixture Models
We consider the Wasserstein metric on the Gaussian mixture models (GMMs),
which is defined as the pullback of the full Wasserstein metric on the space of
smooth probability distributions with finite second moment. It derives a class
of Wasserstein metrics on probability simplices over one-dimensional bounded
homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs.
Specifically, for a sequence of GMMs whose variances tend to zero, we prove
that the limit of the Wasserstein metric exists after certain renormalization.
Generalizations of this metric in general GMMs are established, including
inhomogeneous lattice models whose lattice gaps are not the same, extended GMMs
whose mean parameters of Gaussian components can also change, and the
second-order metric containing high-order information of the scaling limit. We
further study the Wasserstein gradient flows on GMMs for three typical
functionals: potential, internal, and interaction energies. Numerical examples
demonstrate the effectiveness of the proposed GMM models for approximating
Wasserstein gradient flows.Comment: 32 pages, 3 figure
Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent
Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based
deterministic sampling algorithm. Despite its wide usage, understanding the
theoretical properties of SVGD has remained a challenging problem. For sampling
from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain
Gaussian as long as the initializer is Gaussian. Inspired by this fact, we
undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD
projected to the family of Gaussian distributions via the bilinear kernel, or
equivalently Gaussian variational inference (GVI) with SVGD. We present a
complete picture by considering both the mean-field PDE and discrete particle
systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD
dynamics is proven to converge linearly to the Gaussian distribution closest to
the target in KL divergence. In the finite-particle setting, there is both
uniform in time convergence to the mean-field limit and linear convergence in
time to the equilibrium if the target is Gaussian. In the general case, we
propose a density-based and a particle-based implementation of the
Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from
different perspectives, emerge as special cases of our unified framework.
Interestingly, one of the new particle-based instance from this framework
empirically outperforms existing approaches. Our results make concrete
contributions towards obtaining a deeper understanding of both SVGD and GVI.Comment: 59 pages, 7 figure
Particle-based Variational Inference with Preconditioned Functional Gradient Flow
Particle-based variational inference (VI) minimizes the KL divergence between
model samples and the target posterior with gradient flow estimates. With the
popularity of Stein variational gradient descent (SVGD), the focus of
particle-based VI algorithms has been on the properties of functions in
Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow.
However, the requirement of RKHS restricts the function class and algorithmic
flexibility. This paper remedies the problem by proposing a general framework
to obtain tractable functional gradient flow estimates. The functional gradient
flow in our framework can be defined by a general functional regularization
term that includes the RKHS norm as a special case. We use our framework to
propose a new particle-based VI algorithm: preconditioned functional gradient
flow (PFG). Compared with SVGD, the proposed method has several advantages:
larger function class; greater scalability in large particle-size scenarios;
better adaptation to ill-conditioned distributions; provable continuous-time
convergence in KL divergence. Non-linear function classes such as neural
networks can be incorporated to estimate the gradient flow. Both theory and
experiments have shown the effectiveness of our framework.Comment: 34 pages, 8 figure
Wasserstein Consensus ADMM
We introduce Wasserstein consensus alternating direction method of
multipliers (ADMM) and its entropic-regularized version: Sinkhorn consensus
ADMM, to solve measure-valued optimization problems with convex additive
objectives. Several problems of interest in stochastic prediction and learning
can be cast in this form of measure-valued convex additive optimization. The
proposed algorithm generalizes a variant of the standard Euclidean ADMM to the
space of probability measures but departs significantly from its Euclidean
counterpart. In particular, we derive a two layer ADMM algorithm wherein the
outer layer is a variant of consensus ADMM on the space of probability measures
while the inner layer is a variant of Euclidean ADMM. The resulting
computational framework is particularly suitable for solving Wasserstein
gradient flows via distributed computation. We demonstrate the proposed
framework using illustrative numerical examples