Search CORE

46 research outputs found

Beyond L1: Faster and Better Sparse Models with skglm

Author: Bannier Pierre-Antoine
Bertrand Quentin
Gidel Gauthier
Klopfenstein Quentin
Massias Mathurin
Publication venue
Publication date: 16/04/2022
Field of study

We propose a new fast algorithm to estimate any sparse generalized linear model with convex or non-convex separable penalties. Our algorithm is able to solve problems with millions of samples and features in seconds, by relying on coordinate descent, working sets and Anderson acceleration. It handles previously unaddressed models, and is extensively shown to improve state-of-art algorithms. We provide a flexible, scikit-learn compatible package, which easily handles customized datafits and penalties

arXiv.org e-Print Archive

HAL-ENS-LYON

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL

HAL-Lyon 3

FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings

Author: Andreux Mathieu
Balazard Félix
Hallal Mohammad
Klopfenstein Quentin
Li Honghao
Loiseau Nicolas
Mayer Imke
Terrail Jean Ogier du
Publication venue
Publication date: 20/12/2023
Field of study

External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.Comment: code available at: https://github.com/owkin/fedeca, fixed some typos, figures and acknowledgments in v

arXiv.org e-Print Archive

Benchopt: Reproducible, efficient and collaborative optimization benchmarks

Author: Ablin Pierre
Bannier Pierre-Antoine
Charlier Benjamin
Dagréou Mathieu
Dantas Cassio F.
Durif Ghislain
Gramfort Alexandre
Klopfenstein Quentin
la Tour Tom Dupré
Lai En
Larsson Johan
Lefort Tanguy
Malézieux Benoit
Massias Mathurin
Moreau Thomas
Moufad Badr
Nguyen Binh T.
Rakotomamonjy Alain
Ramzi Zaccharie
Salmon Joseph
Vaiter Samuel
Publication venue
Publication date: 28/10/2022
Field of study

Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks:

\ell_2

-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.Comment: Accepted in proceedings of NeurIPS 22; Benchopt library documentation is available at https://benchopt.github.io

arXiv.org e-Print Archive

Age at onset as stratifier in idiopathic Parkinson’s disease – effect of ageing and polygenic risk score on clinical phenotypes

Author: Acharya Geeta
Aguayo Gloria
Alexandre Myriam
Ali Muhammad
Allen Dominic
Ammerlann Wim
and on behalf of the NCER-PD Consortium* :
Balling Rudi
Bassis Michele
Beaumont Katy
Becker Regina
Bellora Camille
Berchem Guy
Berg Daniela
Bisdorff Alexandre
Brockmann Kathrin
Calmes Jessica
Castillo Lorieza
Contesotto Gessica
Diederich Nico
Dondelinger Rene
Esteves Daniela
Fagherazzi Guy
Ferrand Jean-Yves
Gantenbein Manon
Gasser Thomas
Gawron Piotr
Ghosh Soumyabrata
Glaab E.
Glaab Enrico
Gomes Clarissa
Goncharenko Nikolai
Graas Jérôme
Graziano Mariella
Groues Valentin
Grünewald Anne
Gu Wei
Gómez De Lope Elisa
Hammot Gaël
Hanff Anne-Marie
Hansen Linda
Hansen Maxime
Heneka Michael
Henry Estelle
Herbrink Sylvia
Herbrink Sylvia
Herenne Eve
Herzinger Sascha
Heymann Michael
Hu Michele
Hundt Alexander
Jacoby Nadine
Jaroz Yohan
Klopfenstein Quentin
Krüger R.
Krüger Rejko
Lambert Pauline
Landoulsi Z.
Landoulsi Zied
Lebioda Jacek Jaroslaw
Lentz Roseline
Liepelt Inga
Lima Roslina Ramos
Liszka Robert
Longhino Laura
Lorentz Victoria
Lupu Paula Cristina
Mackay Clare
Maetzler Walter
Marcus Katrin
Marques Guilherme
Marques Tainá
May P.
May Patrick
Mcintyre Deborah
Mediouni Chouaib
Meisch Francoise
Menster Myriam
Minelli Maura
Mittelbronn Michel
Mollenhauer Brit
Mommaerts Kathleen
Moreno Carlos
Moudio Serge
Mühlschlegel Friedrich
Nati Romain
Nehrbass Ulf
Nickels Sarah
Nicolai Beatrice
Nicolay Jean-Paul
Oertel Wolfgang
Ostaszewski Marek
Pachchek S.
Pachchek Sinthuja
Pauly Claire
Pauly Laure
Pavelka L.
Pavelka Lukas
Perquin Magali
Rauschenberger A.
Rauschenberger Armin
Rawal Rajesh
Rosales Eduardo
Rosety Isabel
Rump Kirsten
Sandt Estelle
Satagopam Venkata
Schlesser Marc
Schmitt Margaux
Schmitz Sabine
Schneider Reinhard
Schwamborn Jens
Sharify Amir
Soboleva Ekaterina
Sokolowska Kate
Terwindt Olivier
Thien Hermann
Thiry Elodie
Ting Jiin Loo Rebecca
Trefois Christophe
Trouet Johanna
Tsurkalenko Olena
Vaillant Michel
Valenti Mesele
Vilas Boas Liliana
Vyas Maharshi
Wade-Martins Richard
Wilmes Paul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Several phenotypic differences observed in Parkinson’s disease (PD) patients have been linked to age at onset (AAO). We endeavoured to find out whether these differences are due to the ageing process itself by using a combined dataset of idiopathic PD (n = 430) and healthy controls (HC; n = 556) excluding carriers of known PD-linked genetic mutations in both groups. We found several significant effects of AAO on motor and non-motor symptoms in PD, but when comparing the effects of age on these symptoms with HC (using age at assessment, AAA), only positive associations of AAA with burden of motor symptoms and cognitive impairment were significantly different between PD vs HC. Furthermore, we explored a potential effect of polygenic risk score (PRS) on clinical phenotype and identified a significant inverse correlation of AAO and PRS in PD. No significant association between PRS and severity of clinical symptoms was found. We conclude that the observed non-motor phenotypic differences in PD based on AAO are largely driven by the ageing process itself and not by a specific profile of neurodegeneration linked to AAO in the idiopathic PD patients

Repository for DZ "DMA"

Optimisation non-lisse pour l'estimation de composants immunitaires cellulaires dans un environnement tumoral

Author: Klopfenstein Quentin
Publication venue: HAL CCSD
Publication date: 30/06/2021
Field of study

In this PhD proposal we will investigate new regularization methods of inverse problems that provide an absolute quantification of immune cell subpopulations. The mathematical aspect of this PhD proposal is two-fold. The first goal is to enhance the underlying linear model through a more refined construction of the expression matrix. The second goal is, given this linear model, to derive the best possible estimator. These two issues can be treated in a decoupled way, which is the standard for existing methods such as Cibersort, or as a coupled optimization problem (which is known as blind deconvolution in signal processing).Au cours de cette thèse, nous allons rechercher de nouveaux modèles de régularisation de problèmes inverses qui permettent une quantification absolue de populations immunitaires au sein de la tumeur. Il y aura deux objectifs principaux : le premier but est d'améliorer le modèle linéaire en affinant la construction de la matrice d'expression. Le deuxième but est, étant donné le modèle linéaire, de trouver le meilleur estimateur. Ses deux problèmes peuvent être traités séparément, ce qui est utilisé par des méthodes existantes (Cibersort), ou être traités comme un même problème d'optimisation (ce qui est connu sous le nom de déconvolution aveugle)

HAL-uB

Thèses en Ligne

HAL - Université de Franche-Comté

Theses.fr

Hal-Diderot

Linear support vector regression with linear constraints

Author: Klopfenstein Quentin
Vaiter Samuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2021
Field of study

International audienceThis paper studies the addition of linear constraints to the Support Vector Regression when the kernel is linear. Adding those constraints into the problem allows to add prior knowledge on the estimator obtained, such as finding positive vector, probability vector or monotone data. We prove that the related optimization problem stays a semi-definite quadratic problem. We also propose a generalization of the Sequential Minimal Optimization algorithm for solving the optimization problem with linear constraints and prove its convergence. We show that an efficient generalization of this iterative algorithm with closed-form updates can be used to obtain the solution of the underlying optimization problem. Then, practical performances of this estimator are shown on simulated and real datasets with different settings: non negative regression, regression onto the simplex for biomedical data and isotonic regression for weather forecast. These experiments show the usefulness of this estimator in comparison to more classical approaches

HAL-uB

HAL - Université de Franche-Comté

Linear Support Vector Regression with Linear Constraints

Author: Klopfenstein Quentin
Vaiter Samuel
Publication venue: HAL CCSD
Publication date: 05/11/2019
Field of study

This paper studies the addition of linear constraints to the Support Vector Regression (SVR) when the kernel is linear. Adding those constraints into the problem allows to add prior knowledge on the estimator obtained, such as finding probability vector or monotone data. We propose a generalization of the Sequential Minimal Optimization (SMO) algorithm for solving the optimization problem with linear constraints and prove its convergence. Then, practical performances of this estimator are shown on simulated and real datasets with different settings: non negative regression, regression onto the simplex for biomedical data and isotonic regression for weather forecast

arXiv.org e-Print Archive

HAL-uB

HAL - Université de Franche-Comté

Local linear convergence of proximal coordinate descent algorithm

Author: Bertrand Quentin
Gramfort Alexandre
Klopfenstein Quentin
Salmon Joseph
Vaiter S.
Publication venue: Springer Verlag
Publication date: 22/03/2023
Field of study

International audienceFor composite nonsmooth optimization problems, which are "regular enough", proximal gradient descent achieves model identification after a finite number of iterations. For instance, for the Lasso, this implies that the iterates of proximal gradient descent identify the non-zeros coefficients after a finite number of steps. The identification property has been shown for various optimization algorithms, such as accelerated gradient descent, Douglas-Rachford or variance-reduced algorithms, however, results concerning coordinate descent are scarcer. Identification properties often rely on the framework of "partial smoothness", which is a powerful but technical tool. In this work, we show that partial smooth functions have a simple characterization when the nonsmooth penalty is separable. In this simplified framework, we prove that cyclic coordinate descent achieves model identification in finite time, which leads to explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results

HAL-uB

INRIA a CCSD electronic archive server

Model identification and local linear convergence of coordinate descent

Author: Bertrand Quentin
Gramfort Alexandre
Klopfenstein Quentin
Salmon Joseph
Vaiter Samuel
Publication venue: HAL CCSD
Publication date: 23/11/2020
Field of study

For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g., support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results

HAL-uB

HAL - Université de Franche-Comté

INRIA a CCSD electronic archive server

HAL-CEA

Beyond L1: Faster and Better Sparse Models with skglm

Author: Bannier Pierre-Antoine
Bertrand Quentin
Gidel Gauthier
Klopfenstein Quentin
Massias Mathurin
Publication venue: HAL CCSD
Publication date: 27/11/2022
Field of study

International audienceWe propose a new fast algorithm to estimate any sparse generalized linear model with convex or non-convex separable penalties. Our algorithm is able to solve problems with millions of samples and features in seconds, by relying on coordinate descent, working sets and Anderson acceleration. It handles previously unaddressed models, and is extensively shown to improve state-of-art algorithms. We release skglm, a flexible, scikit-learn compatible package, which easily handles customized datafits and penalties

INRIA a CCSD electronic archive server