Search CORE

20 research outputs found

A Note on Optimizing Distributions using Kernel Mean Embeddings

Author: Bach Francis
Muzellec Boris
Rudi Alessandro
Publication venue
Publication date: 18/06/2021
Field of study

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low computational cost and low sample complexity. However, kernel mean embeddings have had limited applications to problems that consist in optimizing distributions, due to the difficulty of characterizing which Hilbert space vectors correspond to a probability distribution. In this note, we propose to leverage the kernel sums-of-squares parameterization of positive functions of Marteau-Ferey et al. [2020] to fit distributions in the MMD geometry. First, we show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense. Then, we provide algorithms to optimize such distributions in the finite-sample setting, which we illustrate in a density fitting numerical experiment

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Missing Data Imputation using Optimal Transport

Author: Boyer Claire
Cuturi Marco
Josse Julie
Muzellec Boris
Publication venue
Publication date: 01/01/2020
Field of study

Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form

Author: Cuturi Marco
Janati Hicham
Muzellec Boris
Peyré Gabriel
Publication venue: HAL CCSD
Publication date: 06/12/2020
Field of study

International audienceAlthough optimal transport (OT) problems admit closed form solutions in a very few notable cases, e.g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry. On the other hand, the numerical resolution of OT problems using entropic regularization has given rise to many applications, but because there are no known closed-form solutions for entropic regularized OT problems, these approaches are mostly algorithmic, not informed by elegant closed forms. In this paper, we propose to fill the void at the intersection between these two schools of thought in OT by proving that the entropy-regularized optimal transport problem between two Gaussian measures admits a closed form. Contrary to the unregularized case, for which the explicit form is given by the Wasserstein-Bures distance, the closed form we obtain is differentiable everywhere, even for Gaussians with degenerate covariance matrices. We obtain this closed form solution by solving the fixed-point equation behind Sinkhorn's algorithm, the default method for computing entropic regularized OT. Remarkably, this approach extends to the generalized unbalanced case-where Gaussian measures are scaled by positive constants. This extension leads to a closed form expression for unbalanced Gaussians as well, and highlights the mass transportation / destruction trade-off seen in unbalanced optimal transport. Moreover, in both settings, we show that the optimal transportation plans are (scaled) Gaussians and provide analytical formulas of their parameters. These formulas constitute the first non-trivial closed forms for entropy-regularized optimal transport, thus providing a ground truth for the analysis of entropic OT and Sinkhorn's algorithm

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-CEA

Régularisation, projections et distributions elliptiques pour le transport optimal

Author: Muzellec Boris
Publication venue: HAL CCSD
Publication date: 26/10/2020
Field of study

Comparing and matching probability distributions is a crucial in numerous machine learning (ML) algorithms. Optimal transport (OT) defines divergences between distributions that are grounded on geometry: starting from a cost function on the underlying space, OT consists in finding a mapping or coupling between both measures that is optimal with respect to that cost. The fact that OT is deeply grounded in geometry makes it particularly well suited to ML. Further, OT is the object of a rich mathematical theory. Despite those advantages, the applications of OT in data sciences have long been hindered by the mathematical and computational complexities of the underlying optimization problem. To circumvent these issues, one approach consists in focusing on particular cases that admit closed-form solutions or that can be efficiently solved. In particular, OT between elliptical distributions is one of the very few instances for which OT is available in closed form, defining the so-called Bures-Wasserstein (BW) geometry. This thesis builds extensively on the BW geometry, with the aim to use it as basic tool in data science applications. To do so, we consider settings in which it is alternatively employed as a basic tool for representation learning, enhanced using subspace projections, and smoothed further using entropic regularization. In a first contribution, the BW geometry is used to define embeddings as elliptical probability distributions, extending on the classical representation of data as vectors in R^d.In the second contribution, we prove the existence of transportation maps and plans that extrapolate maps restricted to lower-dimensional projections, and show that subspace-optimal plans admit closed forms in the case of Gaussian measures.Our third contribution consists in deriving closed forms for entropic OT between Gaussian measures scaled with a varying total mass, which constitute the first non-trivial closed forms for entropic OT and provide the first continuous test case for the study of entropic OT. Finally, in a last contribution, entropic OT is leveraged to tackle missing data imputation in a non-parametric and distribution-preserving way.Pouvoir manipuler et de comparer de mesures de probabilité est essentiel pour de nombreuses applications en apprentissage automatique. Le transport optimal (TO) définit des divergences entre distributions fondées sur la géométrie des espaces sous-jacents : partant d'une fonction de coût définie sur l'espace dans lequel elles sont supportées, le TO consiste à trouver un couplage entre les deux mesures qui soit optimal par rapport à ce coût. Par son ancrage géométrique, le TO est particulièrement bien adapté au machine learning, et fait l'objet d'une riche théorie mathématique. En dépit de ces avantages, l'emploi du TO pour les sciences des données a longtemps été limité par les difficultés mathématiques et computationnelles liées au problème d'optimisation sous-jacent. Pour contourner ce problème, une approche consiste à se concentrer sur des cas particuliers admettant des solutions en forme close, ou pouvant se résoudre efficacement. En particulier, le TO entre mesures elliptiques constitue l'un des rares cas pour lesquels le TO admet une forme close, définissant la géométrie de Bures-Wasserstein (BW). Cette thèse s'appuie tout particulièrement sur la géométrie de BW, dans le but de l'utiliser comme outil de base pour des applications en sciences des données. Pour ce faire, nous considérons des situations dans lesquelles la géométrie de BW est tantôt utilisée comme un outil pour l'apprentissage de représentations, étendue à partir de projections sur des sous-espaces, ou régularisée par un terme entropique. Dans une première contribution, la géométrie de BW est utilisée pour définir des plongements sous la forme de distributions elliptiques, étendant la représentation classique sous forme de vecteurs de R^d. Dans une deuxième contribution, nous prouvons l'existence de transports qui extrapolent des applications restreintes à des projections en faible dimension, et montrons que ces plans "sous-espace optimaux" admettent des formes closes dans le cas de mesures gaussiennes. La troisième contribution de cette thèse consiste à obtenir des formes closes pour le transport entropique entre des mesures gaussiennes non-normalisées, qui constituent les premières expressions non triviales pour le transport entropique. Finalement, dans une dernière contribution nous utilisons le transport entropique pour imputer des données manquantes de manière non-paramétrique, tout en préservant les distributions sous-jacentes

Thèses en Ligne

Theses.fr

HAL-Polytechnique

Learning PSD-valued functions using kernel sums-of-squares

Author: Bach Francis
Muzellec Boris
Rudi Alessandro
Publication venue: HAL CCSD
Publication date: 22/11/2021
Field of study

Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kernel sum-of-squares model for functions that take values in the PSD cone, which extends kernel sums-of-squares models that were recently proposed to encode non-negative scalar functions. We provide a representer theorem for this class of PSD functions, show that it constitutes a universal approximator of PSD functions, and derive eigenvalue bounds in the case of subsampled equality constraints. We then apply our results to modeling convex functions, by enforcing a kernel sum-of-squares representation of their Hessian, and show that any smooth and strongly convex function may be thus represented. Finally, we illustrate our methods on a PSD matrix-valued regression task, and on scalar-valued convex regression

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Missing Data Imputation using Optimal Transport

Author: Boyer Claire
Cuturi Marco
Josse Julie
Muzellec Boris
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceMissing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values

INRIA a CCSD electronic archive server

Dimension-free convergence rates for gradient Langevin dynamics in RKHS

Author: Massias Mathurin
Muzellec Boris
Sato Kanji
Suzuki Taiji
Publication venue: HAL CCSD
Publication date: 02/07/2022
Field of study

International audienceGradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracted considerable attention lately, as a way to provide convergence guarantees in a non-convex setting. However, the known rates grow exponentially with the dimension of the space under the dissipative condition. In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite-dimensional Hilbert space. More precisely, we derive non-asymptotic, dimensionfree convergence rates for GLD/SGLD when performing regularized non-convex optimization in a reproducing kernel Hilbert space. Amongst others, the convergence analysis relies on the properties of a stochastic differential equation, its discrete time Galerkin approximation and the geometric ergodicity of the associated Markov chains

INRIA a CCSD electronic archive server