32 research outputs found

    Exact calculations for false discovery proportion with application to least favorable configurations

    Get PDF
    In a context of multiple hypothesis testing, we provide several new exact calculations related to the false discovery proportion (FDP) of step-up and step-down procedures. For step-up procedures, we show that the number of erroneous rejections conditionally on the rejection number is simply a binomial variable, which leads to explicit computations of the c.d.f., the {ss-th} moment and the mean of the FDP, the latter corresponding to the false discovery rate (FDR). For step-down procedures, we derive what is to our knowledge the first explicit formula for the FDR valid for any alternative c.d.f. of the pp-values. We also derive explicit computations of the power for both step-up and step-down procedures. These formulas are "explicit" in the sense that they only involve the parameters of the model and the c.d.f. of the order statistics of i.i.d. uniform variables. The pp-values are assumed either independent or coming from an equicorrelated multivariate normal model and an additional mixture model for the true/false hypotheses is used. This new approach is used to investigate new results which are of interest in their own right, related to least/most favorable configurations for the FDR and the variance of the FDP

    A semiparametric extension of the stochastic block model for longitudinal networks

    Full text link
    To model recurrent interaction events in continuous time, an extension of the stochastic block model is proposed where every individual belongs to a latent group and interactions between two individuals follow a conditional inhomogeneous Poisson process with intensity driven by the individuals' latent groups. The model is shown to be identifiable and its estimation is based on a semiparametric variational expectation-maximization algorithm. Two versions of the method are developed, using either a nonparametric histogram approach (with an adaptive choice of the partition size) or kernel intensity estimators. The number of latent groups can be selected by an integrated classification likelihood criterion. Finally, we demonstrate the performance of our procedure on synthetic experiments, analyse two datasets to illustrate the utility of our approach and comment on competing methods

    A global homogeneity test for high-dimensional linear regression

    Get PDF
    International audienceThis paper is motivated by the comparison of genetic networks inferred from high-dimensional datasets originating from high-throughput Omics technologies. The aim is to test whether the differences observed between two inferred Gaussian graphical models come from real differences or arise from estimation uncertainties. Adopting a neighborhood approach, we consider a two-sample linear regression model with random design and propose a procedure to test whether these two regressions are the same. Relying on multiple testing and variable selection strategies, we develop a testing procedure that applies to high-dimensional settings where the number of covariates p is larger than the number of observations n 1 and n 2 of the two samples. Both type I and type II errors are explicitly controlled from a non-asymptotic perspective and the test is proved to be minimax adaptive to the sparsity. The performances of the test are evaluated on simulated data. Moreover, we illustrate how this procedure can be used to compare genetic networks on Hess et al. breast cancer microarray dataset

    On least favorable configurations for step-up-down tests

    Get PDF
    This paper investigates an open issue related to false discovery rate (FDR) control of step-up-down (SUD) multiple testing procedures. It has been established in earlier literature that for this type of procedure, under some broad conditions, and in an asymptotical sense, the FDR is maximum when the signal strength under the alternative is maximum. In other words, so-called "Dirac uniform configurations" are asymptotically {\em least favorable} in this setting. It is known that this property also holds in a non-asymptotical sense (for any finite number of hypotheses), for the two extreme versions of SUD procedures, namely step-up and step-down (with extra conditions for the step-down case). It is therefore very natural to conjecture that this non-asymptotical {\em least favorable configuration} property could more generally be true for all "intermediate" forms of SUD procedures. We prove that this is, somewhat surprisingly, not the case. The argument is based on the exact calculations proposed earlier by Roquain and Villers (2011), that we extend here by generalizing Steck's recursion to the case of two populations. Secondly, we quantify the magnitude of this phenomenon by providing a nonasymptotic upper-bound and explicit vanishing rates as a function of the total number of hypotheses

    Conceptual Aspects of Large Meta-Analyses with Publicly Available Microarray Data: A Case Study in Oncology

    Get PDF
    Large public repositories of microarray experiments offer an abundance of biological data. It is of interest to use and to combine the available material to create new biological information and to develop a broader view on biological phenomena

    Tests et sĂ©lection de modĂšles pour l’analyse de donnĂ©es protĂ©omiques et transcriptomiques

    No full text
     148 p + Annexe 25 p. DiplÎme : Dr. d'Universit

    Tests for Gaussian graphical models

    No full text
    International audienceGaussian graphical models are promising tools for analysing genetic networks. In many applications, biologists have some knowledge of the genetic network and may want to assess the quality of their model using gene expression data. This is why one introduces a novel procedure for testing the neighborhoods of a Gaussian graphical model. It is based on the connection between the local Markov property and conditional regression of a Gaussian random variable. Adapting recent results on tests for high-dimensional Gaussian linear models, one proves that the testing procedure inherits appealing theoretical properties. Besides, it applies and is computationally feasible in a high-dimensional setting: the number of nodes may be much larger than the number of observations. A large part of the study is devoted to illustrating and discussing applications to simulated data and to biological dat

    Tests for Gaussian graphical models

    No full text
    International audienceGaussian graphical models are promising tools for analysing genetic networks. In many applications, biologists have some knowledge of the genetic network and may want to assess the quality of their model using gene expression data. This is why one introduces a novel procedure for testing the neighborhoods of a Gaussian graphical model. It is based on the connection between the local Markov property and conditional regression of a Gaussian random variable. Adapting recent results on tests for high-dimensional Gaussian linear models, one proves that the testing procedure inherits appealing theoretical properties. Besides, it applies and is computationally feasible in a high-dimensional setting: the number of nodes may be much larger than the number of observations. A large part of the study is devoted to illustrating and discussing applications to simulated data and to biological dat
    corecore