10 research outputs found

    Adapting Data Adaptive Methods for Small, but High Dimensional Omic Data: Applications to GWAS/EWAS and More

    Get PDF
    Exploratory analysis of high dimensional omics data has received much attention since the explosion of high-throughput technology allows simultaneous screening of tens of thousands of characteristics (genomics, metabolomics, proteomics, adducts, etc., etc.). Part of this trend has been an increase in the dimension of exposure data in studies of environmental exposure and associated biomarkers. Though some of the general approaches, such as GWAS, are transferable, what has received less focus is 1) how to derive estimation of independent associations in the context of many competing causes, without resorting to a misspecified model, and 2) how to derive accurate small-sample inference when data adaptive techniques are used in this context. This paper focuses on semi-parametric variable importance analysis of high dimensional data sets of modest sample size (e.g., gene expression, mRNA, etc). Though the methodology we propose is generally applicable to similar situations, we present the method in the context of a study of miRNA expression for an environmental exposure. Specifically, the analysis is faced with not just a large number of comparisons, but also trying to tease out of association of the expression of miRNA with an exposure apart from confounds such as age, race, smoking conditions, BMI, etc. Our goal is to propose a method that is reasonably robust in small samples, but does not rely on misspecified (arbitrary) parametric assumptions, and thus will be based on data-adaptive methods. The methodology proposed is we believe a powerful combination of existing semi-parametric statistical methods and theory, as well as a simple framework for use of commonly used empirical Bayes approaches to aid in small sample inference. Specifically, We propose using targeted maximum likelihood estimation (TMLE) for estimating variable importance measures along with a general adaptation of the commonly used Limma approach, which relies on specification of the so-called influence curve of the proposed estimator. The result is a machine-based approach that can estimate independent associations in high dimensional data, but protects against the unreliability of small-sample inference that can result when using data adaptive estimation in relatively small samples

    Statistical Inference for Data Adaptive Target Parameters

    Get PDF
    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating sample. We define our sample-split data-adaptive statistical target pa- rameter as the average of these V -sample specific target parameters. We present an analogue estimator of this type of data adaptive target parameter and corresponding statistical inference. This general methodology for generating data adaptive target parameters while still providing valid statistical inference is demonstrated with a number of examples. These examples demonstrate that this methodology presents new opportunities for statistical learning from data that go beyond the usual requirement that the estimand is a priori defined in order to allow for proper statistical inference. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming “data-driven”, the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods - that is, the role of statisticians is being supplanted by computer scientist, deriving clever, yet typically ad hoc methods that “discover” the interesting patterns in data. The methodology presented in this paper can harness these methods, and now provide rigorous inference for the patterns, or target parameters suggested by such procedures. In this way, it returns exercises involving learning from data back within the proper domain of rigorous statistical inference. To suggest such potential, and to verify the predictions of the theory, simulation studies based upon algorithms that map the parameter- generating sample into the desired estimand are shown. However, the methodology generalizes to situations where even these algorithms are not prespecified

    Permutation tests for experimental designs, with extension to simultaneous EEG signal analysis

    No full text
    Les tests de permutation (ou tests de randomisation), forment une classe de tests non paramétriques, et sont donc appropriés pour tester les hypothèses lorsque les postulats paramétriques ne sont pas satisfaits. Dans cette thèse, nous proposons de nouveaux tests de permutation, qui peuvent être utilisés pour analyser des plans d'expériences complexes ainsi que dans des protocoles expérimentaux avec des données de type EEG (Electroencéphalogramme). Plus précisément, dans la première partie de cette thèse (chapitre 2), nous avons développé un test exact de permutation pour les ANOVA à effets fixes ou factorielles, c'est à dire pour les ANOVA avec un seul terme d'erreur. Pour y parvenir, nous calculons d'abord les "résidus du modèle réduit", qui éliminent tous les effets fixes à part celui qui est testé. Dans la deuxième phase de la thèse (chapitre 3), nous nous sommes concentrés sur des designs plus complexes et nous avons adapté ce test de permutation pour les modèles mixtes et les ANOVA à mesures répétées. Nous introduisons un test de permutation approximatif basé sur les résidus du modèle réduit pour les mesures répétées et les modèles mixtes. Dans la troisième partie de la thése (chapitre 4), nous avons étendu les tests de permutation proposés aux cas de données avec plus d'une dimension tels que les signaux EEG (Electroencéphalogramme) ou ERP (Event Related Potential), que nous appellerons des signaux. Dans la grande majorité des expériences en psychologie ou en neurosciences utilisant ces techniques, le plan d'expérience est complexe, avec plusieurs facteurs inter- et/ou intra-sujets. Nous avons adapté la permutation des résidus dans le cadre du modèle réduit pour les signaux dans ces types de plan, c'est à dire pour les ANOVA factorielles, les ANOVA `a mesures répétées et les modèles mixtes

    An exact permutation method for testing any effect in balanced and unbalanced fixed effect ANOVA

    No full text
    The ANOVA method and permutation tests, two heritages of Fisher, have been extensively studied. Several permutation strategies have been proposed by others to obtain a distribution-free test for factors in a fixed effect ANOVA (i.e., single error term ANOVA). The resulting tests are either approximate or exact. However, there exists no universal exact permutation test which can be applied to an arbitrary design to test a desired factor. An exact permutation strategy applicable to fixed effect analysis of variance is presented. The proposed method can be used to test any factor, even in the presence of higher-order interactions. In addition, the method has the advantage of being applicable in unbalanced designs (all-cell-filled), which is a very common situation in practice, and it is the first method with this capability. Simulation studies show that the proposed method has an actual level which stays remarkably close to the nominal level, and its power is always competitive. This is the case even with very small datasets, strongly unbalanced designs and non-Gaussian errors. No other competitor show such an enviable behavior.ANOVA Experimental design Non-parametric methods Permutation test

    An exact permutation method for testing any effect in balanced and unbalanced fixed effect ANOVA

    No full text
    The ANOVA method and permutation tests, two heritages of Fisher, have been extensively studied. Several permutation strategies have been proposed by others to obtain a distribution-free test for factors in a fixed effect ANOVA (i.e., single error term ANOVA). The resulting tests are either approximate or exact. However, there exists no universal exact permutation test which can be applied to an arbitrary design to test a desired factor. An exact permutation strategy applicable to fixed effect analysis of variance is presented. The proposed method can be used to test any factor, even in the presence of higher-order interactions. In addition, the method has the advantage of being applicable in unbalanced designs (all-cell-filled), which is a very common situation in practice, and it is the first method with this capability. Simulation studies show that the proposed method has an actual level which stays remarkably close to the nominal level, and its power is always competitive. This is the case even with very small datasets, strongly unbalanced designs and non-Gaussian errors. No other competitor show such an enviable behavior

    A general permutation approach for analyzing repeated measures ANOVA and mixed-model designs

    No full text
    The ANOVA method and permutation tests, two heritages of Fisher, have been extensively studied. Several permutation strategies have been proposed by others to obtain a distribution-free test for factors in a fixed effect ANOVA (i.e., single error term ANOVA). The resulting tests are either approximate or exact. However, there exists no universal exact permutation test which can be applied to an arbitrary design to test a desired factor. An exact permutation strategy applicable to fixed effect analysis of variance is presented. The proposed method can be used to test any factor, even in the presence of higher-order interactions. In addition, the method has the advantage of being applicable in unbalanced designs (all-cell-filled), which is a very common situation in practice, and it is the first method with this capability. Simulation studies show that the proposed method has an actual level which stays remarkably close to the nominal level, and its power is always competitive. This is the case even with very small datasets, strongly unbalanced designs and non-Gaussian errors. No other competitor show such an enviable behavior
    corecore