6 research outputs found

    Improved Stein Variational Gradient Descent with Importance Weights

    Full text link
    Stein Variational Gradient Descent (SVGD) is a popular sampling algorithm used in various machine learning tasks. It is well known that SVGD arises from a discretization of the kernelized gradient flow of the Kullback-Leibler divergence DKL(π)D_{KL}\left(\cdot\mid\pi\right), where π\pi is the target distribution. In this work, we propose to enhance SVGD via the introduction of importance weights, which leads to a new method for which we coin the name β\beta-SVGD. In the continuous time and infinite particles regime, the time for this flow to converge to the equilibrium distribution π\pi, quantified by the Stein Fisher information, depends on ρ0\rho_0 and π\pi very weakly. This is very different from the kernelized gradient flow of Kullback-Leibler divergence, whose time complexity depends on DKL(ρ0π)D_{KL}\left(\rho_0\mid\pi\right). Under certain assumptions, we provide a descent lemma for the population limit β\beta-SVGD, which covers the descent lemma for the population limit SVGD when β0\beta\to 0. We also illustrate the advantages of β\beta-SVGD over SVGD by simple experiments.Comment: 24 page

    Particle-based Variational Inference with Preconditioned Functional Gradient Flow

    Full text link
    Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. The functional gradient flow in our framework can be defined by a general functional regularization term that includes the RKHS norm as a special case. We use our framework to propose a new particle-based VI algorithm: preconditioned functional gradient flow (PFG). Compared with SVGD, the proposed method has several advantages: larger function class; greater scalability in large particle-size scenarios; better adaptation to ill-conditioned distributions; provable continuous-time convergence in KL divergence. Non-linear function classes such as neural networks can be incorporated to estimate the gradient flow. Both theory and experiments have shown the effectiveness of our framework.Comment: 34 pages, 8 figure

    Covariance-modulated optimal transport and gradient flows

    Full text link
    We study a variant of the dynamical optimal transport problem in which the energy to be minimised is modulated by the covariance matrix of the distribution. Such transport metrics arise naturally in mean-field limits of certain ensemble Kalman methods for solving inverse problems. We show that the transport problem splits into two coupled minimization problems: one for the evolution of mean and covariance of the interpolating curve and one for its shape. The latter consists in minimising the usual Wasserstein length under the constraint of maintaining fixed mean and covariance along the interpolation. We analyse the geometry induced by this modulated transport distance on the space of probabilities as well as the dynamics of the associated gradient flows. Those show better convergence properties in comparison to the classical Wasserstein metric in terms of exponential convergence rates independent of the Gaussian target. On the level of the gradient flows a similar splitting into the evolution of moments and shapes of the distribution can be observed.Comment: 84 pages, 4 figures. Comments are welcom
    corecore