Search CORE

35 research outputs found

Parametric information geometry with the package Geomstats

Author: Brigant Alice Le
Collas Antoine
Deschamps Jules
Miolane Nina
Publication venue
Publication date: 21/11/2022
Field of study

We introduce the information geometry module of the Python package Geomstats. The module first implements Fisher-Rao Riemannian manifolds of widely used parametric families of probability distributions, such as normal, gamma, beta, Dirichlet distributions, and more. The module further gives the Fisher-Rao Riemannian geometry of any parametric family of distributions of interest, given a parameterized probability density function as input. The implemented Riemannian geometry tools allow users to compare, average, interpolate between distributions inside a given family. Importantly, such capabilities open the door to statistics and machine learning on probability distributions. We present the object-oriented implementation of the module along with illustrative examples and show how it can be used to perform learning on manifolds of parametric probability distributions

arXiv.org e-Print Archive

Robust Geometric Metric Learning

Author: Breloy Arnaud
Collas Antoine
Ginolhac Guillaume
Ovarlez Jean-Philippe
Ren Chengfang
Publication venue
Publication date: 29/08/2022
Field of study

This paper proposes new algorithms for the metric learning problem. We start by noticing that several classical metric learning formulations from the literature can be viewed as modified covariance matrix estimation problems. Leveraging this point of view, a general approach, called Robust Geometric Metric Learning (RGML), is then studied. This method aims at simultaneously estimating the covariance matrix of each class while shrinking them towards their (unknown) barycenter. We focus on two specific costs functions: one associated with the Gaussian likelihood (RGML Gaussian), and one with Tyler's M -estimator (RGML Tyler). In both, the barycenter is defined with the Riemannian distance, which enjoys nice properties of geodesic convexity and affine invariance. The optimization is performed using the Riemannian geometry of symmetric positive definite matrices and its submanifold of unit determinant. Finally, the performance of RGML is asserted on real datasets. Strong performance is exhibited while being robust to mislabeled data.Comment: Published in EUSIPCO 2022. Best student paper awar

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL Université de Savoie

Riemannian optimization for non-centered mixture of scaled Gaussian distributions

Author: Breloy Arnaud
Collas Antoine
Ginolhac Guillaume
Ovarlez Jean-Philippe
Ren Chengfang
Publication venue
Publication date: 07/09/2022
Field of study

This paper studies the statistical model of the non-centered mixture of scaled Gaussian distributions (NC-MSG). Using the Fisher-Rao information geometry associated to this distribution, we derive a Riemannian gradient descent algorithm. This algorithm is leveraged for two minimization problems. The first one is the minimization of a regularized negative log- likelihood (NLL). The latter makes the trade-off between a white Gaussian distribution and the NC-MSG. Conditions on the regularization are given so that the existence of a minimum to this problem is guaranteed without assumptions on the samples. Then, the Kullback-Leibler (KL) divergence between two NC-MSG is derived. This divergence enables us to define a minimization problem to compute centers of mass of several NC-MSGs. The proposed Riemannian gradient descent algorithm is leveraged to solve this second minimization problem. Numerical experiments show the good performance and the speed of the Riemannian gradient descent on the two problems. Finally, a Nearest centroid classifier is implemented leveraging the KL divergence and its associated center of mass. Applied on the large scale dataset Breizhcrops, this classifier shows good accuracies as well as robustness to rigid transformations of the test set

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL Université de Savoie

Riemannian geometry for statistical estimation and learning : application to remote sensing

Author: Collas Antoine
Publication venue: HAL CCSD
Publication date: 25/11/2022
Field of study

Remote sensing systems offer an increased opportunity to record multi-temporal and multidimensional images of the earth's surface. This opportunity greatly increases the interest in data processing tools based on multivariate image time series. In this thesis, we propose a clusteringclassification pipeline to segment these data. To do so, robust statistics are estimated and then clustered or classified to obtain a segmentation of the original multivariate image time series. A large part of the thesis is devoted to the theory of Riemannian geometry and its subfield, the information geometry, which studies Riemannian manifolds whose points are probability distributions. It allows to estimate robust statistics very quickly, even on large scale problems, but also to compute Riemannian centers of mass. Indeed, divergences are developed to measure the proximities between the estimated statistics. Then, groups of statistics are averaged by computing their Riemannian centers of mass associated to these divergences. Thus, we adapt classical machine learning algorithms such as the K-means++ or the Nearest centroid classifier to Riemannian manifolds. These algorithms have been implemented for many different combinations of statistics, divergences and Riemannian centers of mass and tested on real datasets such as the Indian pines image and the large crop type mapping dataset Breizhcrops.Les systèmes de télédétection offrent une opportunité accrue d'enregistrer des séries temporelles d'images multivariées de la surface de la Terre. Ainsi, l'intérêt pour les outils automatiques de traitement de ces données augmente considérablement. Dans cette thèse, nous proposons un pipeline de partitionnement et de classification pour segmenter des séries temporelles d'images multivariées. Pour ce faire, des paramètres de lois de probabilité sont estimés de manière robuste puis partitionnés ou classifiés. Une grande partie de la thèse est consacrée à la théorie de la géométrie riemannienne et à son sous-domaine, la géométrie de l'information, qui étudie les variétés riemanniennes dont les points sont des distributions de probabilité. Elle permet d'estimer des paramètres de lois de probabilité très rapidement, même sur des problèmes à grande échelle, mais aussi de calculer des centres de masse riemanniens. En effet, des divergences sont développées pour mesurer les proximités entre les paramètres estimés. Ensuite, des groupes de paramètres sont moyennés en calculant leurs centres de masse riemanniens associés à ces divergences. Ainsi, nous adaptons des algorithmes classiques d'apprentissage automatique tels que le K-means++ ou le classifieur du centroïde le plus proche à des variétés riemanniennes. Ces algorithmes ont été mis en œuvre pour de nombreuses combinaisons de paramètres, divergences et centres de masse riemanniens et testés sur des jeux de données réels tels que l'image Indian pines et le grand jeu de données de cartographie des types de cultures Breizhcrops

HAL-CentraleSupelec

Riemannian geometry for statistical estimation and learning : application to remote sensing

Author: Collas Antoine
Publication venue: HAL CCSD
Publication date: 25/11/2022
Field of study

HAL-CentraleSupelec

Thèses en Ligne

Géométrie riemannienne pour l'estimation et l'apprentissage statistiques : application à la télédétection

Author: Collas Antoine
Publication venue
Publication date: 25/11/2022
Field of study

Les systèmes de télédétection offrent une opportunité accrue d'enregistrer des séries temporelles d'images multivariées de la surface de la Terre. Ainsi, l'intérêt pour les outils automatiques de traitement de ces données augmente considérablement. Dans cette thèse, nous proposons un pipeline de partitionnement et de classification pour segmenter des séries temporelles d'images multivariées. Pour ce faire, des paramètres de lois de probabilité sont estimés de manière robuste puis partitionnés ou classifiés. Une grande partie de la thèse est consacrée à la théorie de la géométrie riemannienne et à son sous-domaine, la géométrie de l'information, qui étudie les variétés riemanniennes dont les points sont des distributions de probabilité. Elle permet d'estimer des paramètres de lois de probabilité très rapidement, même sur des problèmes à grande échelle, mais aussi de calculer des centres de masse riemanniens. En effet, des divergences sont développées pour mesurer les proximités entre les paramètres estimés. Ensuite, des groupes de paramètres sont moyennés en calculant leurs centres de masse riemanniens associés à ces divergences. Ainsi, nous adaptons des algorithmes classiques d'apprentissage automatique tels que le K-means++ ou le classifieur du centroïde le plus proche à des variétés riemanniennes. Ces algorithmes ont été mis en œuvre pour de nombreuses combinaisons de paramètres, divergences et centres de masse riemanniens et testés sur des jeux de données réels tels que l'image Indian pines et le grand jeu de données de cartographie des types de cultures Breizhcrops.Remote sensing systems offer an increased opportunity to record multi-temporal and multidimensional images of the earth's surface. This opportunity greatly increases the interest in data processing tools based on multivariate image time series. In this thesis, we propose a clusteringclassification pipeline to segment these data. To do so, robust statistics are estimated and then clustered or classified to obtain a segmentation of the original multivariate image time series. A large part of the thesis is devoted to the theory of Riemannian geometry and its subfield, the information geometry, which studies Riemannian manifolds whose points are probability distributions. It allows to estimate robust statistics very quickly, even on large scale problems, but also to compute Riemannian centers of mass. Indeed, divergences are developed to measure the proximities between the estimated statistics. Then, groups of statistics are averaged by computing their Riemannian centers of mass associated to these divergences. Thus, we adapt classical machine learning algorithms such as the K-means++ or the Nearest centroid classifier to Riemannian manifolds. These algorithms have been implemented for many different combinations of statistics, divergences and Riemannian centers of mass and tested on real datasets such as the Indian pines image and the large crop type mapping dataset Breizhcrops

HAL-CentraleSupelec

Thèses en Ligne

Theses.fr

Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications

Author: Collas Antoine
Flamary Rémi
Gramfort Alexandre
Publication venue: HAL CCSD
Publication date: 28/01/2024
Field of study

This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA), specifically addressing the challenge of limited labeled signals in the target dataset. Leveraging a domain-dependent mixing model and the optimal transport domain adaptation assumption, we exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains. Theoretical foundations are laid for identifying crucial Stiefel matrices, essential for recovering underlying signal variances from a Riemannian representation of observed signal covariances. We propose an integrated cost function that simultaneously learns these matrices, pairwise domain relationships, and a predictor, classifier, or regressor, depending on the task. Applied to neuroscience problems, MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset

HAL-Polytechnique

Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications

Author: Collas Antoine
Flamary Rémi
Gramfort Alexandre
Publication venue: HAL CCSD
Publication date: 28/01/2024
Field of study

HAL-CEA

Entropic Wasserstein component analysis

Author: Breloy Arnaud
Collas Antoine
Flamary Rémi
Vayer Titouan
Publication venue: HAL CCSD
Publication date: 09/03/2023
Field of study

Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online

HAL-Polytechnique

Entropic Wasserstein component analysis

Author: Breloy Arnaud
Collas Antoine
Flamary Rémi
Vayer Titouan
Publication venue: HAL CCSD
Publication date: 09/03/2023
Field of study

HAL-ENS-LYON

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server