10 research outputs found

    Statistical inference for Bures-Wasserstein barycenters

    Get PDF
    In this work we introduce the concept of Bures-Wasserstein barycenter QQ_*, that is essentially a Fr\'echet mean of some distribution P\mathbb{P} supported on a subspace of positive semi-definite Hermitian operators H+(d)\mathbb{H}_{+}(d). We allow a barycenter to be restricted to some affine subspace of H+(d)\mathbb{H}_{+}(d) and provide conditions ensuring its existence and uniqueness. We also investigate convergence and concentration properties of an empirical counterpart of QQ_* in both Frobenius norm and Bures-Wasserstein distance, and explain, how obtained results are connected to optimal transportation theory and can be applied to statistical inference in quantum mechanics.Comment: 37 pages, 5 figure

    Statistical Aspects of Wasserstein Distances

    Full text link
    Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution. They are ubiquitous in mathematics, with a long history that has seen them catalyse core developments in analysis, optimization, and probability. Beyond their intrinsic mathematical richness, they possess attractive features that make them a versatile tool for the statistician: they can be used to derive weak convergence and convergence of moments, and can be easily bounded; they are well-adapted to quantify a natural notion of perturbation of a probability distribution; and they seamlessly incorporate the geometry of the domain of the distributions in question, thus being useful for contrasting complex objects. Consequently, they frequently appear in the development of statistical theory and inferential methodology, and have recently become an object of inference in themselves. In this review, we provide a snapshot of the main concepts involved in Wasserstein distances and optimal transportation, and a succinct overview of some of their many statistical aspects.Comment: Official version available at https://www.annualreviews.org/doi/full/10.1146/annurev-statistics-030718-10493

    Recent Advances in Optimal Transport for Machine Learning

    Full text link
    Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2022, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport, and its interplay with Machine Learning practice.Comment: 20 pages,5 figures,under revie

    Statistical distances for model validation and clustering. Applications to flow cytometry and fair learning.

    Get PDF
    This thesis has been developed at the University of Valladolid and IMUVA within the framework of the project Sampling, trimming, and probabilistic metric techniques. Statis- tical applications whose main researchers are Carlos Matr an Bea and Eustasio del Barrio Tellado. Among the lines of research associated with the project are: model validation, Wasserstein distances and robust cluster analysis. It is precisely the work carried out in these elds that gives rise to chapters 1,2 and 4 of this report. The work done in the eld of fair learning with Professor Jean-Michel Loubes, frequent collaborator with Valladolid's team, during the international stay at the Paul Sabatier University of Toulouse, is the basis of Chapter 3 of this report. Therefore, this thesis is an exposition of the problems and results obtained in the di erent elds previously mentioned. Due to the diversity of topics, we have decided to base chapters on the works published or submitted to the present date, and therefore each chapter has a structure relatively independent of the others. In this way Chapter 1 is based on the works [del Barrio et al., 2019e,del Barrio et al., 2019d], Chapter 2 is based on the work [del Barrio et al., 2019c], Chapter 3 on the work [del Barrio et al., 2019b] and Chapter 4 shows results of a work in progress. In this introduction our objective is to present the main challenges we have faced, as well as to brie y present our most relevant results. On the other hand, each chapter will have its own introduction where we will delve into the topics discussed below. With this in mind, our intention is that the reader will have a general idea of what he or she will nd in each chapter and in this way will have the necessary information to face the more technical discussions that will be found there. Due to the diversity of topics dealt with in this report, we propose a non-linear reading. We suggest that the reader, after reading a section of the Introduction, moves to the corresponding chapter. In this way the reader will have the relevant information more at hand and will be able to follow better the exposition in each chapter. If on the other hand there is a sequential reading of the document, we apologize in advance for some repetitions and reiterations, which nevertheless seem to us to contribute positively to the understanding of this work.Departamento de Estadística e Investigación OperativaDoctorado en Matemática

    Robust clustering tools based on optimal transportation

    Full text link
    corecore