1,451 research outputs found

    Unsupervised empirical Bayesian multiple testing with external covariates

    Full text link
    In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a list of significant hypotheses which differs in length and order from the list obtained by methods not taking covariate-information into account. Covariate-modulated posterior probabilities of each null hypothesis are estimated using a fast approximate algorithm. The new method is applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses

    Full text link
    This paper continues the line of research initiated in \cite{Liu:Sarkar:Zhao:2016} on developing a novel framework for multiple testing of hypotheses grouped in a one-way classified form using hypothesis-specific local false discovery rates (Lfdr's). It is built on an extension of the standard two-class mixture model from single to multiple groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr for the hypothesis given that it is within a significant group and the Lfdr for the group itself and involving a new parameter that measures grouping effect. This definition captures the underlying group structure for the hypotheses belonging to a group more effectively than the standard two-class mixture model. Two new Lfdr based methods, possessing meaningful optimalities, are produced in their oracle forms. One, designed to control false discoveries across the entire collection of hypotheses, is proposed as a powerful alternative to simply pooling all the hypotheses into a single group and using commonly used Lfdr based method under the standard single-group two-class mixture model. The other is proposed as an Lfdr analog of the method of \cite{Benjamini:Bogomolov:2014} for selective inference. It controls Lfdr based measure of false discoveries associated with selecting groups concurrently with controlling the average of within-group false discovery proportions across the selected groups. Simulation studies and real-data application show that our proposed methods are often more powerful than their relevant competitors.Comment: 26 pages, 17 figure

    A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework.

    Get PDF
    BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). RESULTS: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. CONCLUSIONS: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS

    Leveraging genomic annotations and pleiotropic enrichment for improved replication rates in schizophrenia GWAS

    Get PDF
    Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3

    Weighted False Discovery Rate Control in Large-Scale Multiple Testing

    Get PDF
    The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. This paper studies weighted multiple testing in a decision-theoretic framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to genome-wide association study is discussed.Comment: Revise

    Plasma protein biomarkers for depression and schizophrenia by multi analyte profiling of case-control collections.

    Get PDF
    Despite significant research efforts aimed at understanding the neurobiological underpinnings of psychiatric disorders, the diagnosis and the evaluation of treatment of these disorders are still based solely on relatively subjective assessment of symptoms. Therefore, biological markers which could improve the current classification of psychiatry disorders, and in perspective stratify patients on a biological basis into more homogeneous clinically distinct subgroups, are highly needed. In order to identify novel candidate biological markers for major depression and schizophrenia, we have applied a focused proteomic approach using plasma samples from a large case-control collection. Patients were diagnosed according to DSM criteria using structured interviews and a number of additional clinical variables and demographic information were assessed. Plasma samples from 245 depressed patients, 229 schizophrenic patients and 254 controls were submitted to multi analyte profiling allowing the evaluation of up to 79 proteins, including a series of cytokines, chemokines and neurotrophins previously suggested to be involved in the pathophysiology of depression and schizophrenia. Univariate data analysis showed more significant p-values than would be expected by chance and highlighted several proteins belonging to pathways or mechanisms previously suspected to be involved in the pathophysiology of major depression or schizophrenia, such as insulin and MMP-9 for depression, and BDNF, EGF and a number of chemokines for schizophrenia. Multivariate analysis was carried out to improve the differentiation of cases from controls and identify the most informative panel of markers. The results illustrate the potential of plasma biomarker profiling for psychiatric disorders, when conducted in large collections. The study highlighted a set of analytes as candidate biomarker signatures for depression and schizophrenia, warranting further investigation in independent collections

    Digging for gold nuggets : uncovering novel candidate genes for variation in gastrointestinal nematode burden in a wild bird species

    Get PDF
    Acknowledgements This study was funded by a BBSRC studentship (MAWenzel) and NERC grants NE/H00775X/1 and NE/D000602/1 (SB Piertney). The authors are grateful to Marianne James, Mario Roder and Keliya Bai for field-work assistance, Lucy M.I. Webster and Steve Paterson for help during prior development of genetic markers,Heather Ritchie for helpful comments on manuscript drafts and all estate owners, factors and keepers for access to field sites, most particularly MJ Taylor and Mike Nisbet (Airlie), Neil Brown (Allargue), RR Gledson and David Scrimgeour (Delnadamph), Andrew Salvesen and John Hay (Dinnet), Stuart Young and Derek Calder (Edinglassie), Kirsty Donald and DavidBusfield (Glen Dye), Neil Hogbin and Ab Taylor (Glen Muick), Alistair Mitchell (Glenlivet), Simon Blackett, Jim Davidson and Liam Donald (Invercauld), Richard Cooke and Fred Taylor (Invermark), Shaila Rao and Christopher Murphy (Mar Lodge), and Ralph Peters and Philip Astor (Tillypronie)Peer reviewedPostprin
    corecore