Search CORE

17 research outputs found

Non-reversible Parallel Tempering for Deep Posterior Approximation

Author: Deng Wei
Feng Qi
Liang Faming
Lin Guang
Zhang Qian
Publication venue
Publication date: 19/11/2022
Field of study

Parallel tempering (PT), also known as replica exchange, is the go-to workhorse for simulations of multi-modal distributions. The key to the success of PT is to adopt efficient swap schemes. The popular deterministic even-odd (DEO) scheme exploits the non-reversibility property and has successfully reduced the communication cost from

O(P^2)

O(P)

given sufficiently many

P

chains. However, such an innovation largely disappears in big data due to the limited chains and few bias-corrected swaps. To handle this issue, we generalize the DEO scheme to promote non-reversibility and propose a few solutions to tackle the underlying bias caused by the geometric stopping time. Notably, in big data scenarios, we obtain an appealing communication cost

O(P\log P)

based on the optimal window size. In addition, we also adopt stochastic gradient descent (SGD) with large and constant learning rates as exploration kernels. Such a user-friendly nature enables us to conduct approximation tasks for complex posteriors without much tuning costs.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Subsampling Error in Stochastic Gradient Langevin Diffusions

Author: Jin Kexin
Latz Jonas
Liu Chenguang
Publication venue
Publication date: 23/05/2023
Field of study

The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data. As opposed to many usual Markov chain Monte Carlo (MCMC) algorithms, SGLD is not stationary with respect to the posterior distribution; two sources of error appear: The first error is introduced by an Euler--Maruyama discretisation of a Langevin diffusion process, the second error comes from the data subsampling that enables its use in large-scale data settings. In this work, we consider an idealised version of SGLD to analyse the method's pure subsampling error that we then see as a best-case error for diffusion-based subsampling MCMC methods. Indeed, we introduce and study the Stochastic Gradient Langevin Diffusion (SGLDiff), a continuous-time Markov process that follows the Langevin diffusion corresponding to a data subset and switches this data subset after exponential waiting times. There, we show that the Wasserstein distance between the posterior and the limiting distribution of SGLDiff is bounded above by a fractional power of the mean waiting time. Importantly, this fractional power does not depend on the dimension of the state space. We bring our results into context with other analyses of SGLD

arXiv.org e-Print Archive

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

Author: Lu Yulong
Slepčev Dejan
Wang Lihan
Publication venue
Publication date: 23/05/2023
Field of study

Motivated by the challenge of sampling Gibbs measures with nonconvex potentials, we study a continuum birth-death dynamics. We improve results in previous works [51,57] and provide weaker hypotheses under which the probability density of the birth-death governed by Kullback-Leibler divergence or by

\chi^2

divergence converge exponentially fast to the Gibbs equilibrium measure, with a universal rate that is independent of the potential barrier. To build a practical numerical sampler based on the pure birth-death dynamics, we consider an interacting particle system, which is inspired by the gradient flow structure and the classical Fokker-Planck equation and relies on kernel-based approximations of the measure. Using the technique of

\Gamma

-convergence of gradient flows, we show that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero. Moreover we provide quantitative estimates on the bias of minimizers of the energy corresponding to the kernelized dynamics. Finally we prove the long-time asymptotic results on the convergence of the asymptotic states of the kernelized dynamics towards the Gibbs measure.Comment: significant mathematical changes with more rigor on gradient flow

arXiv.org e-Print Archive

MCMC-driven learning

Author: Bouchard-Côté Alexandre
Campbell Trevor
Pleiss Geoff
Surjanovic Nikola
Publication venue
Publication date: 14/02/2024
Field of study

This paper is intended to appear as a chapter for the Handbook of Markov Chain Monte Carlo. The goal of this chapter is to unify various problems at the intersection of Markov chain Monte Carlo (MCMC) and machine learning\unicode{x2014}which includes black-box variational inference, adaptive MCMC, normalizing flow construction and transport-assisted MCMC, surrogate-likelihood MCMC, coreset construction for MCMC with big data, Markov chain gradient descent, Markovian score climbing, and more\unicode{x2014}within one common framework. By doing so, the theory and methods developed for each may be translated and generalized

arXiv.org e-Print Archive