506 research outputs found
A Smoothed Dual Approach for Variational Wasserstein Problems
Variational problems that involve Wasserstein distances have been recently
proposed to summarize and learn from probability measures. Despite being
conceptually simple, such problems are computationally challenging because they
involve minimizing over quantities (Wasserstein distances) that are themselves
hard to compute. We show that the dual formulation of Wasserstein variational
problems introduced recently by Carlier et al. (2014) can be regularized using
an entropic smoothing, which leads to smooth, differentiable, convex
optimization problems that are simpler to implement and numerically more
stable. We illustrate the versatility of this approach by applying it to the
computation of Wasserstein barycenters and gradient flows of spacial
regularization functionals
Fast Optimal Transport Averaging of Neuroimaging Data
Knowing how the Human brain is anatomically and functionally organized at the
level of a group of healthy individuals or patients is the primary goal of
neuroimaging research. Yet computing an average of brain imaging data defined
over a voxel grid or a triangulation remains a challenge. Data are large, the
geometry of the brain is complex and the between subjects variability leads to
spatially or temporally non-overlapping effects of interest. To address the
problem of variability, data are commonly smoothed before group linear
averaging. In this work we build on ideas originally introduced by Kantorovich
to propose a new algorithm that can average efficiently non-normalized data
defined over arbitrary discrete domains using transportation metrics. We show
how Kantorovich means can be linked to Wasserstein barycenters in order to take
advantage of an entropic smoothing approach. It leads to a smooth convex
optimization problem and an algorithm with strong convergence guarantees. We
illustrate the versatility of this tool and its empirical behavior on
functional neuroimaging data, functional MRI and magnetoencephalography (MEG)
source estimates, defined on voxel grids and triangulations of the folded
cortical surface.Comment: Information Processing in Medical Imaging (IPMI), Jun 2015, Isle of
Skye, United Kingdom. Springer, 201
Variational Approaches for Image Labeling on the Assignment Manifold
The image labeling problem refers to the task of assigning to each pixel a single element from a finite predefined set of labels. In classical approaches the labeling task is formulated as a minimization problem of specifically structured objective functions.
Assignment flows for contextual image labeling are a recently proposed alternative formulation via spatially coupled replicator equations.
In this work, the classical and dynamical viewpoint of image labeling are combined into a variational formulation. This is accomplished by following the induced Riemannian gradient descent flow on an elementary statistical manifold with respect to the underlying information geometry.
Convergence and stability behavior of this approach are investigated using the log-barrier method. A novel parameterization of the assignment flow by its dominant component is derived, revealing a Riemannian gradient flow structure that clearly identifies the two governing processes of the flow: spatial regularization of assignments and gradual enforcement of unambiguous label decisions. Also, a continuous-domain formulation of the corresponding potential is presented and well-posedness of the related optimization problem is established. Furthermore, an alternative smooth variational approach to maximum a-posteriori inference based on discrete graphical models is derived by utilizing local Wasserstein distances. Following the resulting Riemannian gradient flow leads to an inference process which always satisfies the local marginalization constraints and incorporates a smooth rounding mechanism towards unambiguous assignments
Inference and Model Parameter Learning for Image Labeling by Geometric Assignment
Image labeling is a fundamental problem in the area of low-level image analysis. In this work, we present novel approaches to maximum a posteriori (MAP) inference and model
parameter learning for image labeling, respectively. Both approaches are formulated in a smooth geometric setting, whose respective solution space is a simple Riemannian manifold. Optimization
consists of multiplicative updates that geometrically integrate the resulting Riemannian gradient flow.
Our novel approach to MAP inference is based on discrete graphical models. By utilizing local Wasserstein distances for coupling assignment measures across edges of the
underlying graph, we smoothly approximate a given discrete objective function and restrict it to the
assignment manifold. A corresponding update scheme combines geometric integration of the resulting gradient flow, and rounding to integral solutions that represent
valid labelings. This formulation constitutes an inner relaxation of the discrete labeling problem, i.e. throughout this process local marginalization constraints known from the established linear programming relaxation are satisfied.
Furthermore, we study the inverse problem of model parameter learning using the linear assignment flow and training data with ground truth. This is accomplished by a Riemannian gradient flow on the manifold of parameters that determine the regularization properties of the assignment flow. This smooth formulation enables us to tackle the model parameter learning problem from the perspective of parameter estimation of dynamical systems. By using symplectic partitioned Runge--Kutta methods for numerical integration, we show that deriving the sensitivity conditions of the parameter learning problem and its discretization commute. A favorable property of our approach is that learning is based on exact inference
Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two
probability distributions supported on two distinct low-dimensional manifolds
living in a much higher-dimensional space) is a crucial problem arising in the
estimation of generative models for high-dimensional observations such as those
arising in computer vision or natural language. It is known that optimal
transport metrics can represent a cure for this problem, since they were
specifically designed as an alternative to information divergences to handle
such problematic scenarios. Unfortunately, training generative machines using
OT raises formidable computational and statistical challenges, because of (i)
the computational burden of evaluating OT losses, (ii) the instability and lack
of smoothness of these losses, (iii) the difficulty to estimate robustly these
losses and their gradients in high dimension. This paper presents the first
tractable computational method to train large scale generative models using an
optimal transport loss, and tackles these three issues by relying on two key
ideas: (a) entropic smoothing, which turns the original OT loss into one that
can be computed using Sinkhorn fixed point iterations; (b) algorithmic
(automatic) differentiation of these iterations. These two approximations
result in a robust and differentiable approximation of the OT loss with
streamlined GPU execution. Entropic smoothing generates a family of losses
interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus
allowing to find a sweet spot leveraging the geometry of OT and the favorable
high-dimensional sample complexity of MMD which comes with unbiased gradient
estimates. The resulting computational architecture complements nicely standard
deep network generative models by a stack of extra layers implementing the loss
function
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
- …