143 research outputs found

    Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

    Full text link
    We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure

    Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

    Full text link
    We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.Comment: 48 pages, 8 figure

    Non-Parametric and Regularized Dynamical Wasserstein Barycenters for Time-Series Analysis

    Full text link
    We consider probabilistic time-series models for systems that gradually transition among a finite number of states. We are particularly motivated by applications such as human activity analysis where the observed time-series contains segments representing distinct activities such as running or walking as well as segments characterized by continuous transition among these states. Accordingly, the dynamical Wasserstein barycenter (DWB) model introduced in Cheng et al. in 2021 [1] associates with each state, which we call a pure state, its own probability distribution, and models these continuous transitions with the dynamics of the barycentric weights that combine the pure state distributions via the Wasserstein barycenter. Here, focusing on the univariate case where Wasserstein distances and barycenters can be computed in closed form, we extend [1] by discussing two challenges associated with learning a DWB model and two improvements. First, we highlight the issue of uniqueness in identifying the model parameters. Secondly, we discuss the challenge of estimating a dynamically evolving distribution given a limited number of samples. The uncertainty associated with this estimation may cause a model's learned dynamics to not reflect the gradual transitions characteristic of the system. The first improvement introduces a regularization framework that addresses this uncertainty by imposing temporal smoothness on the dynamics of the barycentric weights while leveraging the understanding of the non-uniqueness of the problem. This is done without defining an entire stochastic model for the dynamics of the system as in [1]. Our second improvement lifts the Gaussian assumption on the pure states distributions in [1] by proposing a quantile-based non-parametric representation. We pose model estimation in a variational framework and propose a finite approximation to the infinite dimensional problem

    A New Family of Dual-norm regularized pp-Wasserstein Metrics

    Full text link
    We develop a novel family of metrics over measures, using pp-Wasserstein style optimal transport (OT) formulation with dual-norm based regularized marginal constraints. Our study is motivated by the observation that existing works have only explored Ï•\phi-divergence regularized Wasserstein metrics like the Generalized Wasserstein metrics or the Gaussian-Hellinger-Kantorovich metrics. It is an open question if Wasserstein style metrics can be defined using regularizers that are not Ï•\phi-divergence based. Our work provides an affirmative answer by proving that the proposed formulation, under mild conditions, indeed induces valid metrics for any dual norm. The proposed regularized metrics seem to achieve the best of both worlds by inheriting useful properties from the parent metrics, viz., the pp-Wasserstein and the dual-norm involved. For example, when the dual norm is Maximum Mean Discrepancy (MMD), we prove that the proposed regularized metrics inherit the dimension-free sample complexity from the MMD regularizer; while preserving/enhancing other useful properties of the pp-Wasserstein metric. Further, when p=1p=1, we derive a Fenchel dual, which enables proving that the proposed metrics actually induce novel norms over measures. Also, in this case, we show that the mixture geodesic, which is a common geodesic for the parent metrics, remains a geodesic. We empirically study various properties of the proposed metrics and show their utility in diverse applications

    Sliced Multi-Marginal Optimal Transport

    Get PDF
    Multi-marginal optimal transport enables one to compare multiple probability measures, which increasingly finds application in multi-task learning problems. One practical limitation of multi-marginal transport is computational scalability in the number of measures, samples and dimensionality. In this work, we propose a multi-marginal optimal transport paradigm based on random one-dimensional projections, whose (generalized) distance we term the sliced multi-marginal Wasserstein distance. To construct this distance, we introduce a characterization of the one-dimensional multi-marginal Kantorovich problem and use it to highlight a number of properties of the sliced multi-marginal Wasserstein distance. In particular, we show that (i) the sliced multi-marginal Wasserstein distance is a (generalized) metric that induces the same topology as the standard Wasserstein distance, (ii) it admits a dimension-free sample complexity, (iii) it is tightly connected with the problem of barycentric averaging under the sliced-Wasserstein metric. We conclude by illustrating the sliced multi-marginal Wasserstein on multi-task density estimation and multi-dynamics reinforcement learning problems

    On the complexity of approximating Wasserstein barycenter

    Get PDF
    We study the complexity of approximating Wassertein barycenter of discrete measures, or histograms by contrasting two alternative approaches, both using entropic regularization. We provide a novel analysis for our approach based on the Iterative Bregman Projections (IBP) algorithm to approximate the original non-regularized barycenter. We also get the complexity bound for alternative accelerated-gradient-descent-based approach and compare it with the bound obtained for IBP. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ", which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology

    On the Complexity of Approximating Wasserstein Barycenter

    Get PDF
    We study the complexity of approximating Wassertein barycenter of mm discrete measures, or histograms of size nn by contrasting two alternative approaches, both using entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to mn2ε2\frac{mn^2}{\varepsilon^2} to approximate the original non-regularized barycenter. Using an alternative accelerated-gradient-descent-based approach, we obtain a complexity proportional to mn2.5ε\frac{mn^{2.5}}{\varepsilon} . As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ε\varepsilon, which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.Comment: Corrected misprints. Added a reference to accelerated Iterative Bregman Projections introduced in arXiv:1906.0362
    • …
    corecore