4,131 research outputs found
Wasserstein Variational Inference
This paper introduces Wasserstein variational inference, a new form of
approximate Bayesian inference based on optimal transport theory. Wasserstein
variational inference uses a new family of divergences that includes both
f-divergences and the Wasserstein distance as special cases. The gradients of
the Wasserstein variational loss are obtained by backpropagating through the
Sinkhorn iterations. This technique results in a very stable likelihood-free
training method that can be used with implicit distributions and probabilistic
programs. Using the Wasserstein variational inference framework, we introduce
several new forms of autoencoders and test their robustness and performance
against existing variational autoencoding techniques.Comment: 8 pages, 1 figur
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
Bridging the Gap Between Variational Inference and Wasserstein Gradient Flows
Variational inference is a technique that approximates a target distribution
by optimizing within the parameter space of variational families. On the other
hand, Wasserstein gradient flows describe optimization within the space of
probability measures where they do not necessarily admit a parametric density
function. In this paper, we bridge the gap between these two methods. We
demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow
can be recast as the Euclidean gradient flow where its forward Euler scheme is
the standard black-box variational inference algorithm. Specifically, the
vector field of the gradient flow is generated via the path-derivative gradient
estimator. We also offer an alternative perspective on the path-derivative
gradient, framing it as a distillation procedure to the Wasserstein gradient
flow. Distillations can be extended to encompass -divergences and
non-Gaussian variational families. This extension yields a new gradient
estimator for -divergences, readily implementable using contemporary machine
learning libraries like PyTorch or TensorFlow
- …