21 research outputs found

    Accelerated Information Gradient flow

    Full text link
    We present a framework for Nesterov's accelerated gradient flows in probability space. Here four examples of information metrics are considered, including Fisher-Rao metric, Wasserstein-2 metric, Kalman-Wasserstein metric and Stein metric. For both Fisher-Rao and Wasserstein-2 metrics, we prove convergence properties of accelerated gradient flows. In implementations, we propose a sampling-efficient discrete-time algorithm for Wasserstein-2, Kalman-Wasserstein and Stein accelerated gradient flows with a restart technique. We also formulate a kernel bandwidth selection method, which learns the gradient of logarithm of density from Brownian-motion samples. Numerical experiments, including Bayesian logistic regression and Bayesian neural network, show the strength of the proposed methods compared with state-of-the-art algorithms.Comment: 33 page

    De-randomizing MCMC dynamics with the diffusion Stein operator

    Get PDF
    Publisher Copyright: Β© 2021 Neural information processing systems foundation. All rights reserved.Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein metric with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretize into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complicated MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.Peer reviewe

    Transporting Higher-Order Quadrature Rules: Quasi-Monte Carlo Points and Sparse Grids for Mixture Distributions

    Full text link
    Integration against, and hence sampling from, high-dimensional probability distributions is of essential importance in many application areas and has been an active research area for decades. One approach that has drawn increasing attention in recent years has been the generation of samples from a target distribution Ptar\mathbb{P}_{\mathrm{tar}} using transport maps: if Ptar=T#Pref\mathbb{P}_{\mathrm{tar}} = T_\# \mathbb{P}_{\mathrm{ref}} is the pushforward of an easily-sampled probability distribution Pref\mathbb{P}_{\mathrm{ref}} under the transport map TT, then the application of TT to Pref\mathbb{P}_{\mathrm{ref}}-distributed samples yields Ptar\mathbb{P}_{\mathrm{tar}}-distributed samples. This paper proposes the application of transport maps not just to random samples, but also to quasi-Monte Carlo points, higher-order nets, and sparse grids in order for the transformed samples to inherit the original convergence rates that are often better than Nβˆ’1/2N^{-1/2}, NN being the number of samples/quadrature nodes. Our main result is the derivation of an explicit transport map for the case that Ptar\mathbb{P}_{\mathrm{tar}} is a mixture of simple distributions, e.g.\ a Gaussian mixture, in which case application of the transport map TT requires the solution of an \emph{explicit} ODE with \emph{closed-form} right-hand side. Mixture distributions are of particular applicability and interest since many methods proceed by first approximating Ptar\mathbb{P}_{\mathrm{tar}} by a mixture and then sampling from that mixture (often using importance reweighting). Hence, this paper allows for the sampling step to provide a better convergence rate than Nβˆ’1/2N^{-1/2} for all such methods.Comment: 24 page

    Accelerated gradient descent method for functionals of probability measures by new convexity and smoothness based on transport maps

    Full text link
    We consider problems of minimizing functionals F\mathcal{F} of probability measures on the Euclidean space. To propose an accelerated gradient descent algorithm for such problems, we consider gradient flow of transport maps that give push-forward measures of an initial measure. Then we propose a deterministic accelerated algorithm by extending Nesterov's acceleration technique with momentum. This algorithm do not based on the Wasserstein geometry. Furthermore, to estimate the convergence rate of the accelerated algorithm, we introduce new convexity and smoothness for F\mathcal{F} based on transport maps. As a result, we can show that the accelerated algorithm converges faster than a normal gradient descent algorithm. Numerical experiments support this theoretical result.Comment: 31 page

    Scaling Limits of the Wasserstein information matrix on Gaussian Mixture Models

    Full text link
    We consider the Wasserstein metric on the Gaussian mixture models (GMMs), which is defined as the pullback of the full Wasserstein metric on the space of smooth probability distributions with finite second moment. It derives a class of Wasserstein metrics on probability simplices over one-dimensional bounded homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs. Specifically, for a sequence of GMMs whose variances tend to zero, we prove that the limit of the Wasserstein metric exists after certain renormalization. Generalizations of this metric in general GMMs are established, including inhomogeneous lattice models whose lattice gaps are not the same, extended GMMs whose mean parameters of Gaussian components can also change, and the second-order metric containing high-order information of the scaling limit. We further study the Wasserstein gradient flows on GMMs for three typical functionals: potential, internal, and interaction energies. Numerical examples demonstrate the effectiveness of the proposed GMM models for approximating Wasserstein gradient flows.Comment: 32 pages, 3 figure

    Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent

    Full text link
    Stein Variational Gradient Descent (SVGD) is a nonparametric particle-based deterministic sampling algorithm. Despite its wide usage, understanding the theoretical properties of SVGD has remained a challenging problem. For sampling from a Gaussian target, the SVGD dynamics with a bilinear kernel will remain Gaussian as long as the initializer is Gaussian. Inspired by this fact, we undertake a detailed theoretical study of the Gaussian-SVGD, i.e., SVGD projected to the family of Gaussian distributions via the bilinear kernel, or equivalently Gaussian variational inference (GVI) with SVGD. We present a complete picture by considering both the mean-field PDE and discrete particle systems. When the target is strongly log-concave, the mean-field Gaussian-SVGD dynamics is proven to converge linearly to the Gaussian distribution closest to the target in KL divergence. In the finite-particle setting, there is both uniform in time convergence to the mean-field limit and linear convergence in time to the equilibrium if the target is Gaussian. In the general case, we propose a density-based and a particle-based implementation of the Gaussian-SVGD, and show that several recent algorithms for GVI, proposed from different perspectives, emerge as special cases of our unified framework. Interestingly, one of the new particle-based instance from this framework empirically outperforms existing approaches. Our results make concrete contributions towards obtaining a deeper understanding of both SVGD and GVI.Comment: 59 pages, 7 figure

    Particle-based Variational Inference with Preconditioned Functional Gradient Flow

    Full text link
    Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. The functional gradient flow in our framework can be defined by a general functional regularization term that includes the RKHS norm as a special case. We use our framework to propose a new particle-based VI algorithm: preconditioned functional gradient flow (PFG). Compared with SVGD, the proposed method has several advantages: larger function class; greater scalability in large particle-size scenarios; better adaptation to ill-conditioned distributions; provable continuous-time convergence in KL divergence. Non-linear function classes such as neural networks can be incorporated to estimate the gradient flow. Both theory and experiments have shown the effectiveness of our framework.Comment: 34 pages, 8 figure

    Wasserstein Consensus ADMM

    Full text link
    We introduce Wasserstein consensus alternating direction method of multipliers (ADMM) and its entropic-regularized version: Sinkhorn consensus ADMM, to solve measure-valued optimization problems with convex additive objectives. Several problems of interest in stochastic prediction and learning can be cast in this form of measure-valued convex additive optimization. The proposed algorithm generalizes a variant of the standard Euclidean ADMM to the space of probability measures but departs significantly from its Euclidean counterpart. In particular, we derive a two layer ADMM algorithm wherein the outer layer is a variant of consensus ADMM on the space of probability measures while the inner layer is a variant of Euclidean ADMM. The resulting computational framework is particularly suitable for solving Wasserstein gradient flows via distributed computation. We demonstrate the proposed framework using illustrative numerical examples
    corecore