143 research outputs found
Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
We present a novel algorithm to estimate the barycenter of arbitrary
probability distributions with respect to the Sinkhorn divergence. Based on a
Frank-Wolfe optimization strategy, our approach proceeds by populating the
support of the barycenter incrementally, without requiring any pre-allocation.
We consider discrete as well as continuous distributions, proving convergence
rates of the proposed algorithm in both settings. Key elements of our analysis
are a new result showing that the Sinkhorn divergence on compact domains has
Lipschitz continuous gradient with respect to the Total Variation and a
characterization of the sample complexity of Sinkhorn potentials. Experiments
validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure
Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
We study first-order optimization algorithms for computing the barycenter of
Gaussian distributions with respect to the optimal transport metric. Although
the objective is geodesically non-convex, Riemannian GD empirically converges
rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP
solvers. This stands in stark contrast to the best-known theoretical results
for Riemannian GD, which depend exponentially on the dimension. In this work,
we prove new geodesic convexity results which provide stronger control of the
iterates, yielding a dimension-free convergence rate. Our techniques also
enable the analysis of two related notions of averaging, the
entropically-regularized barycenter and the geometric median, providing the
first convergence guarantees for Riemannian GD for these problems.Comment: 48 pages, 8 figure
Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Time-Series Analysis
We consider probabilistic time-series models for systems that gradually
transition among a finite number of states. We are particularly motivated by
applications such as human activity analysis where the observed time-series
contains segments representing distinct activities such as running or walking
as well as segments characterized by continuous transition among these states.
Accordingly, the dynamical Wasserstein barycenter (DWB) model introduced in
Cheng et al. in 2021 [1] associates with each state, which we call a pure
state, its own probability distribution, and models these continuous
transitions with the dynamics of the barycentric weights that combine the pure
state distributions via the Wasserstein barycenter. Here, focusing on the
univariate case where Wasserstein distances and barycenters can be computed in
closed form, we extend [1] by discussing two challenges associated with
learning a DWB model and two improvements. First, we highlight the issue of
uniqueness in identifying the model parameters. Secondly, we discuss the
challenge of estimating a dynamically evolving distribution given a limited
number of samples. The uncertainty associated with this estimation may cause a
model's learned dynamics to not reflect the gradual transitions characteristic
of the system. The first improvement introduces a regularization framework that
addresses this uncertainty by imposing temporal smoothness on the dynamics of
the barycentric weights while leveraging the understanding of the
non-uniqueness of the problem. This is done without defining an entire
stochastic model for the dynamics of the system as in [1]. Our second
improvement lifts the Gaussian assumption on the pure states distributions in
[1] by proposing a quantile-based non-parametric representation. We pose model
estimation in a variational framework and propose a finite approximation to the
infinite dimensional problem
A New Family of Dual-norm regularized -Wasserstein Metrics
We develop a novel family of metrics over measures, using -Wasserstein
style optimal transport (OT) formulation with dual-norm based regularized
marginal constraints. Our study is motivated by the observation that existing
works have only explored -divergence regularized Wasserstein metrics like
the Generalized Wasserstein metrics or the Gaussian-Hellinger-Kantorovich
metrics. It is an open question if Wasserstein style metrics can be defined
using regularizers that are not -divergence based. Our work provides an
affirmative answer by proving that the proposed formulation, under mild
conditions, indeed induces valid metrics for any dual norm. The proposed
regularized metrics seem to achieve the best of both worlds by inheriting
useful properties from the parent metrics, viz., the -Wasserstein and the
dual-norm involved. For example, when the dual norm is Maximum Mean Discrepancy
(MMD), we prove that the proposed regularized metrics inherit the
dimension-free sample complexity from the MMD regularizer; while
preserving/enhancing other useful properties of the -Wasserstein metric.
Further, when , we derive a Fenchel dual, which enables proving that the
proposed metrics actually induce novel norms over measures. Also, in this case,
we show that the mixture geodesic, which is a common geodesic for the parent
metrics, remains a geodesic. We empirically study various properties of the
proposed metrics and show their utility in diverse applications
Sliced Multi-Marginal Optimal Transport
Multi-marginal optimal transport enables one to compare multiple probability
measures, which increasingly finds application in multi-task learning problems.
One practical limitation of multi-marginal transport is computational
scalability in the number of measures, samples and dimensionality. In this
work, we propose a multi-marginal optimal transport paradigm based on random
one-dimensional projections, whose (generalized) distance we term the sliced
multi-marginal Wasserstein distance. To construct this distance, we introduce a
characterization of the one-dimensional multi-marginal Kantorovich problem and
use it to highlight a number of properties of the sliced multi-marginal
Wasserstein distance. In particular, we show that (i) the sliced multi-marginal
Wasserstein distance is a (generalized) metric that induces the same topology
as the standard Wasserstein distance, (ii) it admits a dimension-free sample
complexity, (iii) it is tightly connected with the problem of barycentric
averaging under the sliced-Wasserstein metric. We conclude by illustrating the
sliced multi-marginal Wasserstein on multi-task density estimation and
multi-dynamics reinforcement learning problems
On the complexity of approximating Wasserstein barycenter
We study the complexity of approximating Wassertein barycenter of discrete measures, or
histograms by contrasting two alternative approaches, both using entropic regularization. We provide
a novel analysis for our approach based on the Iterative Bregman Projections (IBP) algorithm
to approximate the original non-regularized barycenter. We also get the complexity bound for alternative
accelerated-gradient-descent-based approach and compare it with the bound obtained
for IBP. As a byproduct, we show that the regularization parameter in both approaches has to
be proportional to ", which causes instability of both algorithms when the desired accuracy is
high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as
a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also
consider the question of scalability of these algorithms using approaches from distributed optimization
and show that the first algorithm can be implemented in a centralized distributed setting
(master/slave), while the second one is amenable to a more general decentralized distributed
setting with an arbitrary network topology
On the Complexity of Approximating Wasserstein Barycenter
We study the complexity of approximating Wassertein barycenter of
discrete measures, or histograms of size by contrasting two alternative
approaches, both using entropic regularization. The first approach is based on
the Iterative Bregman Projections (IBP) algorithm for which our novel analysis
gives a complexity bound proportional to to
approximate the original non-regularized barycenter. Using an alternative
accelerated-gradient-descent-based approach, we obtain a complexity
proportional to . As a byproduct, we show that
the regularization parameter in both approaches has to be proportional to
, which causes instability of both algorithms when the desired
accuracy is high. To overcome this issue, we propose a novel proximal-IBP
algorithm, which can be seen as a proximal gradient method, which uses IBP on
each iteration to make a proximal step. We also consider the question of
scalability of these algorithms using approaches from distributed optimization
and show that the first algorithm can be implemented in a centralized
distributed setting (master/slave), while the second one is amenable to a more
general decentralized distributed setting with an arbitrary network topology.Comment: Corrected misprints. Added a reference to accelerated Iterative
Bregman Projections introduced in arXiv:1906.0362
- …