Search CORE

8 research outputs found

Recent Advances in Optimal Transport for Machine Learning

Author: Mboula Fred Ngolè
Montesuma Eduardo Fernandes
Souloumiac Antoine
Publication venue
Publication date: 28/06/2023
Field of study

Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2022, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport, and its interplay with Machine Learning practice.Comment: 20 pages,5 figures,under revie

arXiv.org e-Print Archive

Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space

Author: Mboula Fred Ngolè
Montesuma Eduardo Fernandes
Souloumiac Antoine
Publication venue
Publication date: 22/08/2023
Field of study

This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.Comment: 13 pages,8 figures,Accepted as a conference paper at the 26th European Conference on Artificial Intelligenc

arXiv.org e-Print Archive

HAL-CEA

Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning

Author: Mboula Fred Ngolè
Montesuma Eduardo Fernandes
Souloumiac Antoine
Publication venue
Publication date: 14/09/2023
Field of study

In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.Comment: 7 pages,4 figure

arXiv.org e-Print Archive

Multi-Source Domain Adaptation for Cross-Domain Fault Diagnosis of Chemical Processes

Author: Corona Francesco
Mboula Fred Ngolè
Montesuma Eduardo Fernandes
Mulas Michela
Souloumiac Antoine
Publication venue
Publication date: 22/08/2023
Field of study

Fault diagnosis is an essential component in process supervision. Indeed, it determines which kind of fault has occurred, given that it has been previously detected, allowing for appropriate intervention. Automatic fault diagnosis systems use machine learning for predicting the fault type from sensor readings. Nonetheless, these models are sensible to changes in the data distributions, which may be caused by changes in the monitored process, such as changes in the mode of operation. This scenario is known as Cross-Domain Fault Diagnosis (CDFD). We provide an extensive comparison of single and multi-source unsupervised domain adaptation (SSDA and MSDA respectively) algorithms for CDFD. We study these methods in the context of the Tennessee-Eastmann Process, a widely used benchmark in the chemical industry. We show that using multiple domains during training has a positive effect, even when no adaptation is employed. As such, the MSDA baseline improves over the SSDA baseline classification accuracy by 23% on average. In addition, under the multiple-sources scenario, we improve classification accuracy of the no adaptation setting by 8.4% on average.Comment: 18 pages,15 figure

arXiv.org e-Print Archive

Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation

Author: Castellon Fabiola Espinosa
Gouy-Pallier Cédric
Mayoue Aurélien
Mboula Fred Ngolè
Montesuma Eduardo Fernandes
Souloumiac Antoine
Publication venue
Publication date: 14/09/2023
Field of study

In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations. The chosen protocols keep clients' data private, thus enhancing overall privacy compared to its centralized counterpart. We empirically demonstrate that our approach successfully generates labeled data on the target domain with extensive experiments on (i) Caltech-Office, (ii) TEP, and (iii) CWRU benchmarks. Furthermore, we compare our method to its centralized counterpart and other benchmarks in federated domain adaptation.Comment: 7 pages,2 figure

arXiv.org e-Print Archive

OPENDENOISING: AN EXTENSIBLE BENCHMARK FOR BUILDING COMPARATIVE STUDIES OF IMAGE DENOISERS

Author: Fernandes Montesuma Eduardo
Lemarchand Florian
Nogues Erwan
Pelcat Maxime
Publication venue: HAL CCSD
Publication date: 18/10/2019
Field of study

International audienc

HAL-CentraleSupelec

arXiv.org e-Print Archive

Crossref

HAL-Rennes 1

Multi-source domain adaptation through dataset dictionary learning in wasserstein space

Author: Fernandes Montesuma Eduardo
Ngole Mboula Fred Maurice
Souloumiac Antoine
Publication venue: HAL CCSD
Publication date: 30/09/2023
Field of study

International audienceThis paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain

HAL-CEA

Cross-domain fault diagnosis through optimal transport for a CSTR process

Author: Corona Francesco
Mboula Fred Maurice Ngole
Montesuma Eduardo Fernandes
Mulas Michela
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Publisher Copyright: © 2022 Elsevier B.V.. All rights reserved.Fault diagnosis is a key task for developing safer control systems, especially in chemical plants. Nonetheless, acquiring good labeled fault data involves sampling from dangerous system conditions. A possible workaround to this limitation is to use simulation data for training data-driven fault diagnosis systems. However, due to modelling errors or unknown factors, simulation data may differ in distribution from real-world data. This setting is known as cross-domain fault diagnosis (CDFD). We use optimal transport for: (i) exploring how modelling errors relate to the distance between simulation (source) and real-world (target) data distributions, and (ii) matching source and target distributions through the framework of optimal transport for domain adaptation (OTDA), resulting in new training data that follows the target distribution. Comparisons show that OTDA outperforms other CDFD methods.Peer reviewe

Aaltodoc Publication Archive

HAL-CEA