10 research outputs found
Statistical inference for Bures-Wasserstein barycenters
In this work we introduce the concept of Bures-Wasserstein barycenter ,
that is essentially a Fr\'echet mean of some distribution
supported on a subspace of positive semi-definite Hermitian operators
. We allow a barycenter to be restricted to some affine
subspace of and provide conditions ensuring its existence
and uniqueness. We also investigate convergence and concentration properties of
an empirical counterpart of in both Frobenius norm and Bures-Wasserstein
distance, and explain, how obtained results are connected to optimal
transportation theory and can be applied to statistical inference in quantum
mechanics.Comment: 37 pages, 5 figure
Statistical Aspects of Wasserstein Distances
Wasserstein distances are metrics on probability distributions inspired by
the problem of optimal mass transportation. Roughly speaking, they measure the
minimal effort required to reconfigure the probability mass of one distribution
in order to recover the other distribution. They are ubiquitous in mathematics,
with a long history that has seen them catalyse core developments in analysis,
optimization, and probability. Beyond their intrinsic mathematical richness,
they possess attractive features that make them a versatile tool for the
statistician: they can be used to derive weak convergence and convergence of
moments, and can be easily bounded; they are well-adapted to quantify a natural
notion of perturbation of a probability distribution; and they seamlessly
incorporate the geometry of the domain of the distributions in question, thus
being useful for contrasting complex objects. Consequently, they frequently
appear in the development of statistical theory and inferential methodology,
and have recently become an object of inference in themselves. In this review,
we provide a snapshot of the main concepts involved in Wasserstein distances
and optimal transportation, and a succinct overview of some of their many
statistical aspects.Comment: Official version available at
https://www.annualreviews.org/doi/full/10.1146/annurev-statistics-030718-10493
Recent Advances in Optimal Transport for Machine Learning
Recently, Optimal Transport has been proposed as a probabilistic framework in
Machine Learning for comparing and manipulating probability distributions. This
is rooted in its rich history and theory, and has offered new solutions to
different problems in machine learning, such as generative modeling and
transfer learning. In this survey we explore contributions of Optimal Transport
for Machine Learning over the period 2012 -- 2022, focusing on four sub-fields
of Machine Learning: supervised, unsupervised, transfer and reinforcement
learning. We further highlight the recent development in computational Optimal
Transport, and its interplay with Machine Learning practice.Comment: 20 pages,5 figures,under revie
Statistical distances for model validation and clustering. Applications to flow cytometry and fair learning.
This thesis has been developed at the University of Valladolid and IMUVA within the
framework of the project Sampling, trimming, and probabilistic metric techniques. Statis-
tical applications whose main researchers are Carlos Matr an Bea and Eustasio del Barrio
Tellado. Among the lines of research associated with the project are: model validation,
Wasserstein distances and robust cluster analysis. It is precisely the work carried out in
these elds that gives rise to chapters 1,2 and 4 of this report.
The work done in the eld of fair learning with Professor Jean-Michel Loubes, frequent
collaborator with Valladolid's team, during the international stay at the Paul Sabatier
University of Toulouse, is the basis of Chapter 3 of this report.
Therefore, this thesis is an exposition of the problems and results obtained in the
di erent elds previously mentioned. Due to the diversity of topics, we have decided to
base chapters on the works published or submitted to the present date, and therefore
each chapter has a structure relatively independent of the others. In this way Chapter 1
is based on the works [del Barrio et al., 2019e,del Barrio et al., 2019d], Chapter 2 is based
on the work [del Barrio et al., 2019c], Chapter 3 on the work [del Barrio et al., 2019b]
and Chapter 4 shows results of a work in progress.
In this introduction our objective is to present the main challenges we have faced, as
well as to brie
y present our most relevant results. On the other hand, each chapter will
have its own introduction where we will delve into the topics discussed below. With this
in mind, our intention is that the reader will have a general idea of what he or she will
nd in each chapter and in this way will have the necessary information to face the more
technical discussions that will be found there.
Due to the diversity of topics dealt with in this report, we propose a non-linear reading.
We suggest that the reader, after reading a section of the Introduction, moves to the
corresponding chapter. In this way the reader will have the relevant information more
at hand and will be able to follow better the exposition in each chapter. If on the other
hand there is a sequential reading of the document, we apologize in advance for some
repetitions and reiterations, which nevertheless seem to us to contribute positively to the
understanding of this work.Departamento de Estadística e Investigación OperativaDoctorado en Matemática