Search CORE

7 research outputs found

Arbres de décision et forêts aléatoires pour variables groupées

Author: Poterie Audrey
Publication venue: HAL CCSD
Publication date: 18/10/2018
Field of study

In many problems in supervised learning, inputs have a known and/or obvious group structure. In this context, elaborating a prediction rule that takes into account the group structure can be more relevant than using an approach based only on the individual variables for both prediction accuracy and interpretation. The goal of this thesis is to develop some tree-based methods adapted to grouped variables. Here, we propose two new tree-based approaches which use the group structure to build decision trees. The first approach allows to build binary decision trees for classification problems. A split of a node is defined according to the choice of both a splitting group and a linear combination of the inputs belonging to the splitting group. The second method, which can be used for prediction problems in both regression and classification, builds a non-binary tree in which each split is a binary tree. These two approaches build a maximal tree which is next pruned. To this end, we propose two pruning strategies, one of which is a generalization of the minimal cost-complexity pruning algorithm. Since decisions trees are known to be unstable, we introduce a method of random forests that deals with groups of inputs. In addition to the prediction purpose, these new methods can be also use to perform group variable selection thanks to the introduction of some measures of group importance, This thesis work is supplemented by an independent part in which we consider the unsupervised framework. We introduce a new clustering algorithm. Under some classical regularity and sparsity assumptions, we obtain the rate of convergence of the clustering risk for the proposed alqorithm.Dans de nombreux problèmes en apprentissage supervisé, les entrées ont une structure de groupes connue et/ou clairement identifiable. Dans ce contexte, l'élaboration d'une règle de prédiction utilisant les groupes plutôt que les variables individuelles peut être plus pertinente tant au niveau des performances prédictives que de l'interprétation. L'objectif de la thèse est de développer des méthodes par arbres adaptées aux variables groupées. Nous proposons deux approches qui utilisent la structure groupée des variables pour construire des arbres de décisions. La première méthode permet de construire des arbres binaires en classification. Une coupure est définie par le choix d'un groupe et d'une combinaison linéaire des variables du dit groupe. La seconde approche, qui peut être utilisée en régression et en classification, construit un arbre non-binaire dans lequel chaque coupure est un arbre binaire. Ces deux approches construisent un arbre maximal qui est ensuite élagué. Nous proposons pour cela deux stratégies d'élagage dont une est une généralisation du minimal cost-complexity pruning. Les arbres de décision étant instables, nous introduisons une méthode de forêts aléatoires pour variables groupées. Outre l'aspect prédiction, ces méthodes peuvent aussi être utilisées pour faire de la sélection de groupes grâce à l'introduction d'indices d'importance des groupes. Ce travail est complété par une partie indépendante dans laquelle nous nous plaçons dans un cadre d'apprentissage non supervisé. Nous introduisons un nouvel algorithme de clustering. Sous des hypothèses classiques, nous obtenons des vitesses de convergence pour le risque de clustering de l'algorithme proposé

Decisions trees and random forests for grouped variables

Author: Poterie Audrey
Publication venue
Publication date: 18/10/2018
Field of study

Dans de nombreux problèmes en apprentissage supervisé, les entrées ont une structure de groupes connue et/ou clairement identifiable. Dans ce contexte, l'élaboration d'une règle de prédiction utilisant les groupes plutôt que les variables individuelles peut être plus pertinente tant au niveau des performances prédictives que de l'interprétation. L'objectif de la thèse est de développer des méthodes par arbres adaptées aux variables groupées. Nous proposons deux approches qui utilisent la structure groupée des variables pour construire des arbres de décisions. La première méthode permet de construire des arbres binaires en classification. Une coupure est définie par le choix d'un groupe et d'une combinaison linéaire des variables du dit groupe. La seconde approche, qui peut être utilisée en régression et en classification, construit un arbre non-binaire dans lequel chaque coupure est un arbre binaire. Ces deux approches construisent un arbre maximal qui est ensuite élagué. Nous proposons pour cela deux stratégies d'élagage dont une est une généralisation du minimal cost-complexity pruning. Les arbres de décision étant instables, nous introduisons une méthode de forêts aléatoires pour variables groupées. Outre l'aspect prédiction, ces méthodes peuvent aussi être utilisées pour faire de la sélection de groupes grâce à l'introduction d'indices d'importance des groupes. Ce travail est complété par une partie indépendante dans laquelle nous nous plaçons dans un cadre d'apprentissage non supervisé. Nous introduisons un nouvel algorithme de clustering. Sous des hypothèses classiques, nous obtenons des vitesses de convergence pour le risque de clustering de l'algorithme proposé.In many problems in supervised learning, inputs have a known and/or obvious group structure. In this context, elaborating a prediction rule that takes into account the group structure can be more relevant than using an approach based only on the individual variables for both prediction accuracy and interpretation. The goal of this thesis is to develop some tree-based methods adapted to grouped variables. Here, we propose two new tree-based approaches which use the group structure to build decision trees. The first approach allows to build binary decision trees for classification problems. A split of a node is defined according to the choice of both a splitting group and a linear combination of the inputs belonging to the splitting group. The second method, which can be used for prediction problems in both regression and classification, builds a non-binary tree in which each split is a binary tree. These two approaches build a maximal tree which is next pruned. To this end, we propose two pruning strategies, one of which is a generalization of the minimal cost-complexity pruning algorithm. Since decisions trees are known to be unstable, we introduce a method of random forests that deals with groups of inputs. In addition to the prediction purpose, these new methods can be also use to perform group variable selection thanks to the introduction of some measures of group importance, This thesis work is supplemented by an independent part in which we consider the unsupervised framework. We introduce a new clustering algorithm. Under some classical regularity and sparsity assumptions, we obtain the rate of convergence of the clustering risk for the proposed alqorithm

Theses.fr

Statistical analysis of a hierarchical clustering algorithm with outliers

Author: Klutchnikoff Nicolas
Poterie Audrey
Rouviere Laurent
Publication venue: HAL CCSD
Publication date: 26/02/2021
Field of study

It is well known that the classical single linkage algorithm usually fails to identify clusters in the presence of outliers. In this paper, we propose a new version of this algorithm, and we study its mathematical performances. In particular, we establish an oracle type inequality which ensures that our procedure allows to recover the clusters with large probability under minimal assumptions on the distribution of the outliers. We deduce from this inequality the consistency and some rates of convergence of our algorithm for various situations. Performances of our approach is also assessed through simulation studies and a comparison with classical clustering algorithms on simulated data is also presented

arXiv.org e-Print Archive

Hal-Diderot

HAL-Rennes 1

Classification tree algorithm for grouped variables

Author: Dupuy Jean-François
Monbet Valérie
Poterie Audrey
Rouviere Laurent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

International audienc

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

VS-LTGARCHX: A Flexible Subset Selection Approach for Estimation of log-TGARCHX Models and Its Application to BTC Markets

Author: Elvira Victor
Orujov Samir
Poterie Audrey
Rajabov Farid
Septier Francois
Publication venue: HAL CCSD
Publication date: 13/11/2023
Field of study

The log-TGARCHX model is less restrictive in terms of inclusion of exogenous variables and asymmetry lags compared to the GARCHX model. However, adding less (more) covariates than necessary may lead to underfitting (overfitting), respectively. In this context, we propose a new algorithm, called VS-LTGARCHX, which incorporates a variable selection procedure into the log-TGARCHX estimation process. Furthermore, the VS-LTGARCHX algorithm is applied to extremely volatile BTC markets using 42 conditioning variables. Interestingly, our results show that the VS-LTGARCHX models outperform the specified benchmark models in one-step-ahead forecasting

HAL-Université de Bretagne Occidentale

Standardized evaluation of tumor-infiltrating lymphocytes in breast cancer: results of the ring studies of the international immuno-oncology biomarker working group

Author: Badve Sunil
Bago-Horvath Zsuzsanna
Bane Anita
Bedri Shahinaz
Brock Jane
Budczies Jan
Chmielik Ewa
Christgen Matthias
Colpaert Cecile
Demaria Sandra
Denkert Carsten
Floris Giuseppe
Fox Stephen B.
Gao Dongxia
Ingold Heppner Barbara
Kim S. Rim
Kos Zuzana
Kreipe Hans H.
Lakhani Sunil R.
Loibl Sibylle
Penault-Llorca Frederique
Poterie Audrey
Pruneri Giancarlo
Radosevic-Robin Nina
Rimm David L.
Schnitt Stuart J.
Sinn Bruno V.
Sinn Peter.
Sirtaine Nicolas
Van Den Eynden Gert
Wienert Stephan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2016
Field of study

Multiple independent studies have shown that tumor-infiltrating lymphocytes (TIL) are prognostic in breast cancer with potential relevance for response to immune-checkpoint inhibitor therapy. Although many groups are currently evaluating TIL, there is no standardized system for diagnostic applications. This study reports the results of two ring studies investigating TIL conducted by the International Working Group on Immuno-oncology Biomarkers. The study aim was to determine the intraclass correlation coefficient (ICC) for evaluation of TIL by different pathologists. A total of 120 slides were evaluated by a large group of pathologists with a web-based system in ring study 1 and a more advanced software-system in ring study 2 that included an integrated feedback with standardized reference images. The predefined aim for successful ring studies 1 and 2 was an ICC above 0.7 (lower limit of 95% confidence interval (CI)). In ring study 1 the prespecified endpoint was not reached (ICC: 0.70; 95% CI: 0.62-0.78). On the basis of an analysis of sources of variation, we developed a more advanced digital image evaluation system for ring study 2, which improved the ICC to 0.89 (95% CI: 0.85-0.92). The Fleiss' kappa value fo

University of Queensland eSpace

Standardized evaluation of tumor-infiltrating lymphocytes in breast cancer: results of the ring studies of the international immuno-oncology biomarker working group

Author: Anita Bane
Audrey Poterie
Barbara Ingold Heppner
Bruno V Sinn
C Denkert
C Denkert
C Gu-Trantien
Carsten Denkert
Cecile Colpaert
David L Rimm
Dongxia Gao
EA Perez
Ewa Chmielik
Frederick Klauschen
Frederique Penault-Llorca
G Bianchini
G von Minckwitz
Gert Van den Eynden
Giancarlo Pruneri
Giuseppe Floris
Giuseppe Viale
Gunter von Minckwitz
Hans H Kreipe
Jan Budczies
Jane Brock
Karen Willard-Gallo
Koen Van de Vijver
M Ignatiadis
M Schmidt
Matthias Christgen
Michael Untch
ML Disis
MY Polley
MY Polley
Nicolas Sirtaine
Nina Radosevic-Robin
NM Tung
NR West
Peter A Fasching
Peter Sinn
R Salgado
R Yamaguchi
Roberto Salgado
Roland de Wind
S Adams
S Loi
S Rim Kim
Sandra A O'Toole
Sandra Demaria
Shahinaz Bedri
Sherene Loi
Sibylle Loibl
Stefan Michiels
Stephan Wienert
Stephen B Fox
Stuart J Schnitt
Sunil Badve
Sunil R Lakhani
Toralf Reimer
WH Fridman
Y Issa-Nummer
Zsuzsanna Bago-Horvath
Zuzana Kos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref