6 research outputs found
Improved Stein Variational Gradient Descent with Importance Weights
Stein Variational Gradient Descent (SVGD) is a popular sampling algorithm
used in various machine learning tasks. It is well known that SVGD arises from
a discretization of the kernelized gradient flow of the Kullback-Leibler
divergence , where is the target
distribution. In this work, we propose to enhance SVGD via the introduction of
importance weights, which leads to a new method for which we coin the name
-SVGD. In the continuous time and infinite particles regime, the time
for this flow to converge to the equilibrium distribution , quantified by
the Stein Fisher information, depends on and very weakly. This
is very different from the kernelized gradient flow of Kullback-Leibler
divergence, whose time complexity depends on
. Under certain assumptions, we provide a
descent lemma for the population limit -SVGD, which covers the descent
lemma for the population limit SVGD when . We also illustrate the
advantages of -SVGD over SVGD by simple experiments.Comment: 24 page
Particle-based Variational Inference with Preconditioned Functional Gradient Flow
Particle-based variational inference (VI) minimizes the KL divergence between
model samples and the target posterior with gradient flow estimates. With the
popularity of Stein variational gradient descent (SVGD), the focus of
particle-based VI algorithms has been on the properties of functions in
Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow.
However, the requirement of RKHS restricts the function class and algorithmic
flexibility. This paper remedies the problem by proposing a general framework
to obtain tractable functional gradient flow estimates. The functional gradient
flow in our framework can be defined by a general functional regularization
term that includes the RKHS norm as a special case. We use our framework to
propose a new particle-based VI algorithm: preconditioned functional gradient
flow (PFG). Compared with SVGD, the proposed method has several advantages:
larger function class; greater scalability in large particle-size scenarios;
better adaptation to ill-conditioned distributions; provable continuous-time
convergence in KL divergence. Non-linear function classes such as neural
networks can be incorporated to estimate the gradient flow. Both theory and
experiments have shown the effectiveness of our framework.Comment: 34 pages, 8 figure
Covariance-modulated optimal transport and gradient flows
We study a variant of the dynamical optimal transport problem in which the
energy to be minimised is modulated by the covariance matrix of the
distribution. Such transport metrics arise naturally in mean-field limits of
certain ensemble Kalman methods for solving inverse problems. We show that the
transport problem splits into two coupled minimization problems: one for the
evolution of mean and covariance of the interpolating curve and one for its
shape. The latter consists in minimising the usual Wasserstein length under the
constraint of maintaining fixed mean and covariance along the interpolation. We
analyse the geometry induced by this modulated transport distance on the space
of probabilities as well as the dynamics of the associated gradient flows.
Those show better convergence properties in comparison to the classical
Wasserstein metric in terms of exponential convergence rates independent of the
Gaussian target. On the level of the gradient flows a similar splitting into
the evolution of moments and shapes of the distribution can be observed.Comment: 84 pages, 4 figures. Comments are welcom