77 research outputs found
Entropic Optimal Transport in Machine Learning: applications to distributional regression, barycentric estimation and probability matching
Regularised optimal transport theory has been gaining increasing interest in machine learning as a versatile tool to handle and compare probability measures. Entropy-based regularisations, known as Sinkhorn divergences, have proved successful in a wide range of applications: as a metric for clustering and barycenters estimation, as a tool to transfer information in domain adaptation, and as a fitting loss for generative models, to name a few. Given this success, it is crucial to investigate the statistical and optimization properties of such models. These aspects are instrumental to design new and principled paradigms that contribute to further advance the field. Nonetheless, questions on asymptotic guarantees of the estimators based on Entropic Optimal Transport have received less attention. In this thesis we target such questions, focusing on three major settings where Entropic Optimal Transport has been used: learning histograms in supervised frameworks, barycenter estimation and probability matching. We present the first consistent estimator for learning with Sinkhorn loss in supervised settings, with explicit excess risk bounds. We propose a novel algorithm for Sinkhorn barycenters that handles arbitrary probability distributions with provable global convergence guarantees. Finally, we address generative models with Sinkhorn divergence as loss function: we analyse the role of the latent distribution and the generator from a modelling and statistical perspective. We propose a method that learns the latent distribution and the generator jointly and we characterize the generalization properties of such estimator. Overall, the tools developed in this work contribute to the understanding of the theoretical properties of Entropic Optimal Transport and their versatility in machine learning
Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction
We study the interplay between surrogate methods for structured prediction
and techniques from multitask learning designed to leverage relationships
between surrogate outputs. We propose an efficient algorithm based on trace
norm regularization which, differently from previous methods, does not require
explicit knowledge of the coding/decoding functions of the surrogate framework.
As a result, our algorithm can be applied to the broad class of problems in
which the surrogate space is large or even infinite dimensional. We study
excess risk bounds for trace norm regularized structured prediction, implying
the consistency and learning rates for our estimator. We also identify relevant
regimes in which our approach can enjoy better generalization performance than
previous methods. Numerical experiments on ranking problems indicate that
enforcing low-rank relations among surrogate outputs may indeed provide a
significant advantage in practice.Comment: 42 pages, 1 tabl
Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
We present a novel algorithm to estimate the barycenter of arbitrary
probability distributions with respect to the Sinkhorn divergence. Based on a
Frank-Wolfe optimization strategy, our approach proceeds by populating the
support of the barycenter incrementally, without requiring any pre-allocation.
We consider discrete as well as continuous distributions, proving convergence
rates of the proposed algorithm in both settings. Key elements of our analysis
are a new result showing that the Sinkhorn divergence on compact domains has
Lipschitz continuous gradient with respect to the Total Variation and a
characterization of the sample complexity of Sinkhorn potentials. Experiments
validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure
Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance
Applications of optimal transport have recently gained remarkable attention
thanks to the computational advantages of entropic regularization. However, in
most situations the Sinkhorn approximation of the Wasserstein distance is
replaced by a regularized version that is less accurate but easy to
differentiate. In this work we characterize the differential properties of the
original Sinkhorn distance, proving that it enjoys the same smoothness as its
regularized version and we explicitly provide an efficient algorithm to compute
its gradient. We show that this result benefits both theory and applications:
on one hand, high order smoothness confers statistical guarantees to learning
with Wasserstein approximations. On the other hand, the gradient formula allows
us to efficiently solve learning and optimization problems in practice.
Promising preliminary experiments complement our analysis.Comment: 26 pages, 4 figure
Web 2.0, Language Learning and Intercultural Competence
Whenever a new form of communication appears on the scene, it immediately becomes the object of discussion. This has been going on since the first penny press edition in 1834, whereas today discussions are carried out with reference to the Internet. The stability with which mass-media have faced different criticism can be well understood thanks to the functionalist analysis which considers the media as a social system working within an external system made up of a set of cultural and social conditions. In spite of its complexity, any set of repetitive actions contribute to maintaining or to weakening the stability of the system. We can say that globalization would not have been possible without the media and Web 2.0 may be of remarkable interest for its role in influencing cultural identity. All the past technologies, from electric light to the airplane, took a whole generation to gain ground among people, and Internet has not required such a long time. The impossibility to digest the new modalities of communication offered by the net creates the risk of unexpected contamination. Geographical magazines often show pictures of native Amazonians dressed in their traditional costumes while using computers and mobile phones. Educational uses of Web 2.0 and mobile learning tools have been rapidly expanded over the last few years and a great number of projects have been planned for teaching languages. Mobile learning includes many areas: handheld computers, MP3 players, notebooks and mobile phones. In this paper we shall outline the methodology including selection of web tools, task design, implementation and intercultural communication. The study carried out at the University of Florence shows that learners develop their communication competence while performing entertaining activities which enable them to achieve the desired goals
Aligning Time Series on Incomparable Spaces
Dynamic time warping (DTW) is a useful method for aligning, comparing and
combining time series, but it requires them to live in comparable spaces. In
this work, we consider a setting in which time series live on different spaces
without a sensible ground metric, causing DTW to become ill-defined. To
alleviate this, we propose Gromov dynamic time warping (GDTW), a distance
between time series on potentially incomparable spaces that avoids the
comparability requirement by instead considering intra-relational geometry. We
demonstrate its effectiveness at aligning, combining and comparing time series
living on incomparable spaces. We further propose a smoothed version of GDTW as
a differentiable loss and assess its properties in a variety of settings,
including barycentric averaging, generative modeling and imitation learning
- …