Search CORE

92 research outputs found

Factor Analysis for Multiple Testing (FAMT): An R Package for Large-Scale Significance Testing under Dependence

Author: Chloe Friguet
David Causeur
Maela Kloareg
Magalie Houee-Bigot
Publication venue
Publication date
Field of study

The R package FAMT (factor analysis for multiple testing) provides a powerful method for large-scale significance testing under dependence. It is especially designed to select differentially expressed genes in microarray data when the correlation structure among gene expressions is strong. Indeed, this method reduces the negative impact of dependence on the multiple testing procedures by modeling the common information shared by all the variables using a factor analysis structure. New test statistics for general linear contrasts are deduced, taking advantage of the common factor structure to reduce correlation and consequently the variance of error rates. Thus, the FAMT method shows improvements with respect to most of the usual methods regarding the non discovery rate and the control of the false discovery rate (FDR). The steps of this procedure, each of them corresponding to R functions, are illustrated in this paper by two microarray data analyses. We first present how to import the gene ex- pression data, the covariates and gene annotations. The second step includes the choice of the optimal number of factors, the factor model fitting, and provides a list of selected genes according to a preset FDR control level. Finally, diagnostic plots are provided to help the user interpret the factors using available external information on either genes or arrays.

Research Papers in Economics

Signal identification in ERP data by decorrelated Higher Criticism Thresholding

Author: Causeur David
Perthame Emeline
Sheu Ching-Fan
Publication venue: HAL CCSD
Publication date: 03/05/2016
Field of study

Event-related potentials (ERPs) are intensive recordings of electrical activity along the scalp time-locked to motor, sensory, or cognitive events. A main objective in ERP studies is to select (rare) time points at which (weak) ERP amplitudes (features) are significantly associated with experimental variable of interest. The Higher Criticism Thresholding (HCT), as an optimal signal detection procedure in the " rare-and-weak " paradigm, appears to be ideally suited for identifying ERP features. However, ERPs exhibit complex temporal dependence patterns violating the assumption under which signal identification can be achieved efficiently for HCT. This article first highlights this impact of dependence in terms of instability of signal estimation by HCT. A factor modeling for the covariance in HCT is then introduced to decorrelate test statistics and to restore stability in estimation. The detection boundary under factor-analytic dependence is derived and the phase diagram is correspondingly extended. Using simulations and a real data analysis example, the proposed method is shown to estimate more efficiently the support of signals compared with standard HCT and other HCT approaches based on a shrinkage estimation of the covariance matrix

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Factor Analysis for Multiple Testing (FAMT): An R Package for Large-Scale Significance Testing under Dependence

Author: Causeur David
Friguet Chloé
Houee-Bigot Magali
Kloareg Maela
Publication venue: University of California, Los Angeles
Publication date: 01/01/2011
Field of study

The R package FAMT (factor analysis for multiple testing) provides a powerful method for large-scale significance testing under dependence. It is especially designed to select differentially expressed genes in microarray data when the correlation structure among gene expressions is strong. Indeed, this method reduces the negative impact of dependence on the multiple testing procedures by modeling the common information shared by all the variables using a factor analysis structure. New test statistics for general linear contrasts are deduced, taking advantage of the common factor structure to reduce correlation and consequently the variance of error rates. Thus, the FAMT method shows improvements with respect to most of the usual methods regarding the non discovery rate and the control of the false discovery rate (FDR). The steps of this procedure, each of them corresponding to R functions, are illustrated in this paper by two microarray data analyses. We first present how to import the gene expression data, the covariates and gene annotations. The second step includes the choice of the optimal number of factors, the factor model fitting, and provides a list of selected gene according to a preset FDR control level. Finally, diagnostic plots are provided to help the user interpret the factors using a vailable external information on either genes or arrays

Journal of Statistical Software

HAL-Rennes 1

Décorrélation adaptative pour la prédiction en grande dimension

Author: Causeur David
Emily Mathieu
Hébert Florian
Publication venue: HAL CCSD
Publication date: 03/06/2019
Field of study

International audienceIn large-scale signicance analysis, ignoring dependence or not is a core issue, leading to many recent results about the impact of decorrelating the pointwise test statistics. Yet, for the estimation of a prediction model, decorrelating large proles of predicting variables is not as clearly questioned, although many comparative studies have reported the superiority of so-called naive methods, ignoring dependence. Under the usual Gaussian mixture model assumption of Linear Discriminant Analysis, we show that, for a given dependence structure, the classication performance of methods ignoring or not dependence may be markedly dierent, according to the pattern of the association signal between the predicting variables and the response. In order to minimize the largest probability of misclassication, we propose a method handling adaptively the dependence. A simulation study shows that the performance of the present method is at least as good as the best of methods ignoring dependence or based on a complete decorrelation of the predicting variables. 1Dans les procédures de tests en grande dimension, la prise en compte ou non de la dépendance donne lieu à de nombreux développements méthodologiques et discussions , notamment sur l'impact de la décorrélation des statistiques de tests. Pourtant, dans une optique d'estimation d'un modèle pour la prédiction, la question de la décorréla-tion de grands prols de variables prédictrices n'est pas abordée dans les mêmes termes, bien que de nombreuses études comparatives aient rapporté la supériorité de méthodes de prédiction dites naïves, au sens où elles ignorent la dépendance. Sous l'hypothèse clas-sique en analyse linéaire discriminante d'un mélange de lois gaussiennes, nous montrons que pour une structure de dépendance des prédicteurs donnée, les performances de clas-sication ignorant ou non cette dépendance peuvent être très variables et opposées selon la forme du signal d'association entre les prédicteurs et la classe. An de minimiser le risque maximal d'erreur de classication, nous proposons donc une prise en compte adap-tative de la dépendance et montrons sur des simulations que les performances de la règle de classication proposée sont généralement au moins aussi bonnes que la meilleure des règles parmi celles ignorant la dépendance ou au contraire basées sur une décorrélation des prédicteurs

HAL-Rennes 1

Variable selection for correlated data in high dimension using decorrelation methods

Author: Causeur David
Friguet Chloé
Perthame Emeline
Sheu Ching-Fan
Publication venue: HAL CCSD
Publication date: 07/04/2016
Field of study

International audienceThe analysis of high throughput data has renewed the statistical methodology for feature selection. Such data are both characterized by their high dimension and their heterogeneity, as the true signal and several confusing factors are often observed at the same time. In such a framework, the usual statistical approaches are questioned and can lead to misleading decisions as they are initially designed under independence assumption among variables. In this talk, I will present some improvements of variable selection methods in regression and supervised classification issues, by accounting for the dependence between selection statistics. The methods proposed in this talk are based on a factor model of covariates, which assumes that variables are conditionally independent given a vector of latent variables. During this talk, I will illustrate the impact of dependence on the stability on some usual selection procedures. Next, I will particularly focus on the analysis of event-related potentials data (ERP) which are widely collected in psychological research to determine the time courses of mental events. Such data are characterized by a temporal dependence pattern both strong and complex which can be modeled by the mentioned above factor model

INRIA a CCSD electronic archive server

A transcriptome multi-tissue analysis identifies biological pathways and genes associated with variations in feed efficiency of growing pigs

Author: Causeur David
Gilbert Hélène,
Gondret Florence
Houée-Bigot Magalie
Lagarrigue Sandrine
Louveau Isabelle
Siegel Anne
Vincent Annie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceBackground - Animal's efficiency in converting feed into lean gain is a critical issue for the profitability of meat industries. This study aimed to describe shared and specific molecular responses in different tissues of pigs divergently selected over eight generations for residual feed intake (RFI). Results - Pigs from the low RFI line had an improved gain-to-feed ratio during the test period and displayed higher leanness but similar adiposity when compared with pigs from the high RFI line at 132 days of age. Transcriptomics data were generated from longissimus muscle, liver and two adipose tissues using a porcine microarray and analyzed for the line effect (n = 24 pigs per line). The most apparent effect of the line was seen in muscle, whereas subcutaneous adipose tissue was the less affected tissue. Molecular data were analyzed by bioinformatics and subjected to multidimensional statistics to identify common biological processes across tissues and key genes participating to differences in the genetics of feed efficiency. Immune response, response to oxidative stress and protein metabolism were the main biological pathways shared by the four tissues that distinguished pigs from the low or high RFI lines. Many immune genes were under-expressed in the four tissues of the most efficient pigs. The main genes contributing to difference between pigs from the low vs high RFI lines were CD40, CTSC and NTN1. Different genes associated with energy use were modulated in a tissue-specific manner between the two lines. The gene expression program related to glycogen utilization was specifically up-regulated in muscle of pigs from the low RFI line (more efficient). Genes involved in fatty acid oxidation were down-regulated in muscle but were promoted in adipose tissues of the same pigs when compared with pigs from the high RFI line (less efficient). This underlined opposite line-associated strategies for energy use in skeletal muscle and adipose tissue. Genes related to cholesterol synthesis and efflux in liver and perirenal fat were also differentially regulated in pigs from the low vs high RFI lines. Conclusions - Non-productive functions such as immunity, defense against pathogens and oxidative stress contribute likely to inter-individual variations in feed efficiency

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

Complex trait subtypes identification using transcriptome profiling reveals an interaction between two QTL affecting adiposity in chicken

Author: A Ghazalpour
CD Friguet C
Colette Désert
David Causeur
EE Schadt
EE Schadt
ES Lander
FC Causeur D
G Le Mignon
Guillaume Le Mignon
JM Elsen
JT Leek
JT Leek
M Kirst
MA Groenen
MB Elsen JM
MC Filangi O
ML Wayne
ML Wayne
N Hubner
Olivier Demeure
Olivier Filangi
P Le Roy
Pascale Le Roy
R DeCook
R Kustra
S Ponsuksili
Sandrine Lagarrigue
VK Mootha
Y Blum
YHY Benjamini
Yuna Blum
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Integrative genomics approaches that combine genotyping and transcriptome profiling in segregating populations have been developed to dissect complex traits. The most common approach is to identify genes whose eQTL colocalize with QTL of interest, providing new functional hypothesis about the causative mutation. Another approach includes defining subtypes for a complex trait using transcriptome profiles and then performing QTL mapping using some of these subtypes. This approach can refine some QTL and reveal new ones. In this paper we introduce Factor Analysis for Multiple Testing (FAMT) to define subtypes more accurately and reveal interaction between QTL affecting the same trait. The data used concern hepatic transcriptome profiles for 45 half sib male chicken of a sire known to be heterozygous for a QTL affecting abdominal fatness (AF) on chromosome 5 distal region around 168 cM. Results Using this methodology which accounts for hidden dependence structure among phenotypes, we identified 688 genes that are significantly correlated to the AF trait and we distinguished 5 subtypes for AF trait, which are not observed with gene lists obtained by classical approaches. After exclusion of one of the two lean bird subtypes, linkage analysis revealed a previously undetected QTL on chromosome 5 around 100 cM. Interestingly, the animals of this subtype presented the same q paternal haplotype at the 168 cM QTL. This result strongly suggests that the two QTL are in interaction. In other words, the "q configuration" at the 168 cM QTL could hide the QTL existence in the proximal region at 100 cM. We further show that the proximal QTL interacts with the previous one detected on the chromosome 5 distal region. Conclusion Our results demonstrate that stratifying genetic population by molecular phenotypes followed by QTL analysis on various subtypes can lead to identification of novel and interacting QTL.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

ProdInra

HAL-Rennes 1