Variable selection for correlated data in high dimension using decorrelation methods

Abstract

International audienceThe analysis of high throughput data has renewed the statistical methodology for feature selection. Such data are both characterized by their high dimension and their heterogeneity, as the true signal and several confusing factors are often observed at the same time. In such a framework, the usual statistical approaches are questioned and can lead to misleading decisions as they are initially designed under independence assumption among variables. In this talk, I will present some improvements of variable selection methods in regression and supervised classification issues, by accounting for the dependence between selection statistics. The methods proposed in this talk are based on a factor model of covariates, which assumes that variables are conditionally independent given a vector of latent variables. During this talk, I will illustrate the impact of dependence on the stability on some usual selection procedures. Next, I will particularly focus on the analysis of event-related potentials data (ERP) which are widely collected in psychological research to determine the time courses of mental events. Such data are characterized by a temporal dependence pattern both strong and complex which can be modeled by the mentioned above factor model

    Similar works