Search CORE

6 research outputs found

Improved Stein Variational Gradient Descent with Importance Weights

Author: Richtárik Peter
Sun Lukang
Publication venue
Publication date: 04/10/2022
Field of study

Stein Variational Gradient Descent (SVGD) is a popular sampling algorithm used in various machine learning tasks. It is well known that SVGD arises from a discretization of the kernelized gradient flow of the Kullback-Leibler divergence

D_{KL}\left(\cdot\mid\pi\right)

, where

\pi

is the target distribution. In this work, we propose to enhance SVGD via the introduction of importance weights, which leads to a new method for which we coin the name

\beta

-SVGD. In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution

\pi

, quantified by the Stein Fisher information, depends on

\rho_0

and

\pi

very weakly. This is very different from the kernelized gradient flow of Kullback-Leibler divergence, whose time complexity depends on

D_{KL}\left(\rho_0\mid\pi\right)

. Under certain assumptions, we provide a descent lemma for the population limit

\beta

-SVGD, which covers the descent lemma for the population limit SVGD when

\beta\to 0

. We also illustrate the advantages of

\beta

-SVGD over SVGD by simple experiments.Comment: 24 page

arXiv.org e-Print Archive

Particle-based Variational Inference with Preconditioned Functional Gradient Flow

Author: Dong Hanze
Lin Yong
Wang Xi
Zhang Tong
Publication venue
Publication date: 25/11/2022
Field of study

Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. The functional gradient flow in our framework can be defined by a general functional regularization term that includes the RKHS norm as a special case. We use our framework to propose a new particle-based VI algorithm: preconditioned functional gradient flow (PFG). Compared with SVGD, the proposed method has several advantages: larger function class; greater scalability in large particle-size scenarios; better adaptation to ill-conditioned distributions; provable continuous-time convergence in KL divergence. Non-linear function classes such as neural networks can be incorporated to estimate the gradient flow. Both theory and experiments have shown the effectiveness of our framework.Comment: 34 pages, 8 figure

arXiv.org e-Print Archive

Covariance-modulated optimal transport and gradient flows

Author: Burger Martin
Erbar Matthias
Hoffmann Franca
Matthes Daniel
Schlichting André
Publication venue
Publication date: 01/01/2023
Field of study

We study a variant of the dynamical optimal transport problem in which the energy to be minimised is modulated by the covariance matrix of the distribution. Such transport metrics arise naturally in mean-field limits of certain ensemble Kalman methods for solving inverse problems. We show that the transport problem splits into two coupled minimization problems: one for the evolution of mean and covariance of the interpolating curve and one for its shape. The latter consists in minimising the usual Wasserstein length under the constraint of maintaining fixed mean and covariance along the interpolation. We analyse the geometry induced by this modulated transport distance on the space of probabilities as well as the dynamics of the associated gradient flows. Those show better convergence properties in comparison to the classical Wasserstein metric in terms of exponential convergence rates independent of the Gaussian target. On the level of the gradient flows a similar splitting into the evolution of moments and shapes of the distribution can be observed.Comment: 84 pages, 4 figures. Comments are welcom

arXiv.org e-Print Archive

Publications at Bielefeld University