92 research outputs found

    Factor Analysis for Multiple Testing (FAMT): An R Package for Large-Scale Significance Testing under Dependence

    Get PDF
    The R package FAMT (factor analysis for multiple testing) provides a powerful method for large-scale significance testing under dependence. It is especially designed to select differentially expressed genes in microarray data when the correlation structure among gene expressions is strong. Indeed, this method reduces the negative impact of dependence on the multiple testing procedures by modeling the common information shared by all the variables using a factor analysis structure. New test statistics for general linear contrasts are deduced, taking advantage of the common factor structure to reduce correlation and consequently the variance of error rates. Thus, the FAMT method shows improvements with respect to most of the usual methods regarding the non discovery rate and the control of the false discovery rate (FDR). The steps of this procedure, each of them corresponding to R functions, are illustrated in this paper by two microarray data analyses. We first present how to import the gene ex- pression data, the covariates and gene annotations. The second step includes the choice of the optimal number of factors, the factor model fitting, and provides a list of selected genes according to a preset FDR control level. Finally, diagnostic plots are provided to help the user interpret the factors using available external information on either genes or arrays.

    Signal identification in ERP data by decorrelated Higher Criticism Thresholding

    Get PDF
    Event-related potentials (ERPs) are intensive recordings of electrical activity along the scalp time-locked to motor, sensory, or cognitive events. A main objective in ERP studies is to select (rare) time points at which (weak) ERP amplitudes (features) are significantly associated with experimental variable of interest. The Higher Criticism Thresholding (HCT), as an optimal signal detection procedure in the " rare-and-weak " paradigm, appears to be ideally suited for identifying ERP features. However, ERPs exhibit complex temporal dependence patterns violating the assumption under which signal identification can be achieved efficiently for HCT. This article first highlights this impact of dependence in terms of instability of signal estimation by HCT. A factor modeling for the covariance in HCT is then introduced to decorrelate test statistics and to restore stability in estimation. The detection boundary under factor-analytic dependence is derived and the phase diagram is correspondingly extended. Using simulations and a real data analysis example, the proposed method is shown to estimate more efficiently the support of signals compared with standard HCT and other HCT approaches based on a shrinkage estimation of the covariance matrix

    Factor Analysis for Multiple Testing (FAMT): An R Package for Large-Scale Significance Testing under Dependence

    Get PDF
    The R package FAMT (factor analysis for multiple testing) provides a powerful method for large-scale significance testing under dependence. It is especially designed to select differentially expressed genes in microarray data when the correlation structure among gene expressions is strong. Indeed, this method reduces the negative impact of dependence on the multiple testing procedures by modeling the common information shared by all the variables using a factor analysis structure. New test statistics for general linear contrasts are deduced, taking advantage of the common factor structure to reduce correlation and consequently the variance of error rates. Thus, the FAMT method shows improvements with respect to most of the usual methods regarding the non discovery rate and the control of the false discovery rate (FDR). The steps of this procedure, each of them corresponding to R functions, are illustrated in this paper by two microarray data analyses. We first present how to import the gene expression data, the covariates and gene annotations. The second step includes the choice of the optimal number of factors, the factor model fitting, and provides a list of selected gene according to a preset FDR control level. Finally, diagnostic plots are provided to help the user interpret the factors using a vailable external information on either genes or arrays

    Décorrélation adaptative pour la prédiction en grande dimension

    Get PDF
    International audienceIn large-scale signicance analysis, ignoring dependence or not is a core issue, leading to many recent results about the impact of decorrelating the pointwise test statistics. Yet, for the estimation of a prediction model, decorrelating large proles of predicting variables is not as clearly questioned, although many comparative studies have reported the superiority of so-called naive methods, ignoring dependence. Under the usual Gaussian mixture model assumption of Linear Discriminant Analysis, we show that, for a given dependence structure, the classication performance of methods ignoring or not dependence may be markedly dierent, according to the pattern of the association signal between the predicting variables and the response. In order to minimize the largest probability of misclassication, we propose a method handling adaptively the dependence. A simulation study shows that the performance of the present method is at least as good as the best of methods ignoring dependence or based on a complete decorrelation of the predicting variables. 1Dans les procĂ©dures de tests en grande dimension, la prise en compte ou non de la dĂ©pendance donne lieu Ă  de nombreux dĂ©veloppements mĂ©thodologiques et discussions , notamment sur l'impact de la dĂ©corrĂ©lation des statistiques de tests. Pourtant, dans une optique d'estimation d'un modĂšle pour la prĂ©diction, la question de la dĂ©corrĂ©la-tion de grands prols de variables prĂ©dictrices n'est pas abordĂ©e dans les mĂȘmes termes, bien que de nombreuses Ă©tudes comparatives aient rapportĂ© la supĂ©rioritĂ© de mĂ©thodes de prĂ©diction dites naĂŻves, au sens oĂč elles ignorent la dĂ©pendance. Sous l'hypothĂšse clas-sique en analyse linĂ©aire discriminante d'un mĂ©lange de lois gaussiennes, nous montrons que pour une structure de dĂ©pendance des prĂ©dicteurs donnĂ©e, les performances de clas-sication ignorant ou non cette dĂ©pendance peuvent ĂȘtre trĂšs variables et opposĂ©es selon la forme du signal d'association entre les prĂ©dicteurs et la classe. An de minimiser le risque maximal d'erreur de classication, nous proposons donc une prise en compte adap-tative de la dĂ©pendance et montrons sur des simulations que les performances de la rĂšgle de classication proposĂ©e sont gĂ©nĂ©ralement au moins aussi bonnes que la meilleure des rĂšgles parmi celles ignorant la dĂ©pendance ou au contraire basĂ©es sur une dĂ©corrĂ©lation des prĂ©dicteurs

    Variable selection for correlated data in high dimension using decorrelation methods

    Get PDF
    International audienceThe analysis of high throughput data has renewed the statistical methodology for feature selection. Such data are both characterized by their high dimension and their heterogeneity, as the true signal and several confusing factors are often observed at the same time. In such a framework, the usual statistical approaches are questioned and can lead to misleading decisions as they are initially designed under independence assumption among variables. In this talk, I will present some improvements of variable selection methods in regression and supervised classification issues, by accounting for the dependence between selection statistics. The methods proposed in this talk are based on a factor model of covariates, which assumes that variables are conditionally independent given a vector of latent variables. During this talk, I will illustrate the impact of dependence on the stability on some usual selection procedures. Next, I will particularly focus on the analysis of event-related potentials data (ERP) which are widely collected in psychological research to determine the time courses of mental events. Such data are characterized by a temporal dependence pattern both strong and complex which can be modeled by the mentioned above factor model

    A transcriptome multi-tissue analysis identifies biological pathways and genes associated with variations in feed efficiency of growing pigs

    Get PDF
    International audienceBackground - Animal's efficiency in converting feed into lean gain is a critical issue for the profitability of meat industries. This study aimed to describe shared and specific molecular responses in different tissues of pigs divergently selected over eight generations for residual feed intake (RFI). Results - Pigs from the low RFI line had an improved gain-to-feed ratio during the test period and displayed higher leanness but similar adiposity when compared with pigs from the high RFI line at 132 days of age. Transcriptomics data were generated from longissimus muscle, liver and two adipose tissues using a porcine microarray and analyzed for the line effect (n = 24 pigs per line). The most apparent effect of the line was seen in muscle, whereas subcutaneous adipose tissue was the less affected tissue. Molecular data were analyzed by bioinformatics and subjected to multidimensional statistics to identify common biological processes across tissues and key genes participating to differences in the genetics of feed efficiency. Immune response, response to oxidative stress and protein metabolism were the main biological pathways shared by the four tissues that distinguished pigs from the low or high RFI lines. Many immune genes were under-expressed in the four tissues of the most efficient pigs. The main genes contributing to difference between pigs from the low vs high RFI lines were CD40, CTSC and NTN1. Different genes associated with energy use were modulated in a tissue-specific manner between the two lines. The gene expression program related to glycogen utilization was specifically up-regulated in muscle of pigs from the low RFI line (more efficient). Genes involved in fatty acid oxidation were down-regulated in muscle but were promoted in adipose tissues of the same pigs when compared with pigs from the high RFI line (less efficient). This underlined opposite line-associated strategies for energy use in skeletal muscle and adipose tissue. Genes related to cholesterol synthesis and efflux in liver and perirenal fat were also differentially regulated in pigs from the low vs high RFI lines. Conclusions - Non-productive functions such as immunity, defense against pathogens and oxidative stress contribute likely to inter-individual variations in feed efficiency

    Complex trait subtypes identification using transcriptome profiling reveals an interaction between two QTL affecting adiposity in chicken

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Integrative genomics approaches that combine genotyping and transcriptome profiling in segregating populations have been developed to dissect complex traits. The most common approach is to identify genes whose eQTL colocalize with QTL of interest, providing new functional hypothesis about the causative mutation. Another approach includes defining subtypes for a complex trait using transcriptome profiles and then performing QTL mapping using some of these subtypes. This approach can refine some QTL and reveal new ones.</p> <p>In this paper we introduce Factor Analysis for Multiple Testing (FAMT) to define subtypes more accurately and reveal interaction between QTL affecting the same trait. The data used concern hepatic transcriptome profiles for 45 half sib male chicken of a sire known to be heterozygous for a QTL affecting abdominal fatness (AF) on chromosome 5 distal region around 168 cM.</p> <p>Results</p> <p>Using this methodology which accounts for hidden dependence structure among phenotypes, we identified 688 genes that are significantly correlated to the AF trait and we distinguished 5 subtypes for AF trait, which are not observed with gene lists obtained by classical approaches. After exclusion of one of the two lean bird subtypes, linkage analysis revealed a previously undetected QTL on chromosome 5 around 100 cM. Interestingly, the animals of this subtype presented the same q paternal haplotype at the 168 cM QTL. This result strongly suggests that the two QTL are in interaction. In other words, the "q configuration" at the 168 cM QTL could hide the QTL existence in the proximal region at 100 cM. We further show that the proximal QTL interacts with the previous one detected on the chromosome 5 distal region.</p> <p>Conclusion</p> <p>Our results demonstrate that stratifying genetic population by molecular phenotypes followed by QTL analysis on various subtypes can lead to identification of novel and interacting QTL.</p

    Selection stability for supervised classification of heterogeneous data.

    No full text
    Selection stability for supervised classi cation of heterogeneous data
    • 

    corecore