1,451 research outputs found
Unsupervised empirical Bayesian multiple testing with external covariates
In an empirical Bayesian setting, we provide a new multiple testing method,
useful when an additional covariate is available, that influences the
probability of each null hypothesis being true. We measure the posterior
significance of each test conditionally on the covariate and the data, leading
to greater power. Using covariate-based prior information in an unsupervised
fashion, we produce a list of significant hypotheses which differs in length
and order from the list obtained by methods not taking covariate-information
into account. Covariate-modulated posterior probabilities of each null
hypothesis are estimated using a fast approximate algorithm. The new method is
applied to expression quantitative trait loci (eQTL) data.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS158 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses
This paper continues the line of research initiated in
\cite{Liu:Sarkar:Zhao:2016} on developing a novel framework for multiple
testing of hypotheses grouped in a one-way classified form using
hypothesis-specific local false discovery rates (Lfdr's). It is built on an
extension of the standard two-class mixture model from single to multiple
groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr
for the hypothesis given that it is within a significant group and the Lfdr for
the group itself and involving a new parameter that measures grouping effect.
This definition captures the underlying group structure for the hypotheses
belonging to a group more effectively than the standard two-class mixture
model. Two new Lfdr based methods, possessing meaningful optimalities, are
produced in their oracle forms. One, designed to control false discoveries
across the entire collection of hypotheses, is proposed as a powerful
alternative to simply pooling all the hypotheses into a single group and using
commonly used Lfdr based method under the standard single-group two-class
mixture model. The other is proposed as an Lfdr analog of the method of
\cite{Benjamini:Bogomolov:2014} for selective inference. It controls Lfdr based
measure of false discoveries associated with selecting groups concurrently with
controlling the average of within-group false discovery proportions across the
selected groups. Simulation studies and real-data application show that our
proposed methods are often more powerful than their relevant competitors.Comment: 26 pages, 17 figure
A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework.
BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). RESULTS: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. CONCLUSIONS: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS
Recommended from our members
Covariate-assisted ranking and screening for large-scale two-sample inference
Two-sample multiple testing has a wide range of applications. The conventionalpractice first reduces the original observations to a vector of p-values and then chooses a cutoffto adjust for multiplicity. However, this data reduction step could cause significant loss ofinformation and thus lead to suboptimal testing procedures.We introduce a new framework fortwo-sample multiple testing by incorporating a carefully constructed auxiliary variable in inferenceto improve the power. A data-driven multiple-testing procedure is developed by employinga covariate-assisted ranking and screening (CARS) approach that optimally combines the informationfrom both the primary and the auxiliary variables. The proposed CARS procedureis shown to be asymptotically valid and optimal for false discovery rate control. The procedureis implemented in the R package CARS. Numerical results confirm the effectiveness of CARSin false discovery rate control and show that it achieves substantial power gain over existingmethods. CARS is also illustrated through an application to the analysis of a satellite imagingdata set for supernova detection
Leveraging genomic annotations and pleiotropic enrichment for improved replication rates in schizophrenia GWAS
Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3
Weighted False Discovery Rate Control in Large-Scale Multiple Testing
The use of weights provides an effective strategy to incorporate prior domain
knowledge in large-scale inference. This paper studies weighted multiple
testing in a decision-theoretic framework. We develop oracle and data-driven
procedures that aim to maximize the expected number of true positives subject
to a constraint on the weighted false discovery rate. The asymptotic validity
and optimality of the proposed methods are established. The results demonstrate
that incorporating informative domain knowledge enhances the interpretability
of results and precision of inference. Simulation studies show that the
proposed method controls the error rate at the nominal level, and the gain in
power over existing methods is substantial in many settings. An application to
genome-wide association study is discussed.Comment: Revise
Plasma protein biomarkers for depression and schizophrenia by multi analyte profiling of case-control collections.
Despite significant research efforts aimed at understanding the neurobiological underpinnings of psychiatric disorders, the diagnosis and the evaluation of treatment of these disorders are still based solely on relatively subjective assessment of symptoms. Therefore, biological markers which could improve the current classification of psychiatry disorders, and in perspective stratify patients on a biological basis into more homogeneous clinically distinct subgroups, are highly needed. In order to identify novel candidate biological markers for major depression and schizophrenia, we have applied a focused proteomic approach using plasma samples from a large case-control collection. Patients were diagnosed according to DSM criteria using structured interviews and a number of additional clinical variables and demographic information were assessed. Plasma samples from 245 depressed patients, 229 schizophrenic patients and 254 controls were submitted to multi analyte profiling allowing the evaluation of up to 79 proteins, including a series of cytokines, chemokines and neurotrophins previously suggested to be involved in the pathophysiology of depression and schizophrenia. Univariate data analysis showed more significant p-values than would be expected by chance and highlighted several proteins belonging to pathways or mechanisms previously suspected to be involved in the pathophysiology of major depression or schizophrenia, such as insulin and MMP-9 for depression, and BDNF, EGF and a number of chemokines for schizophrenia. Multivariate analysis was carried out to improve the differentiation of cases from controls and identify the most informative panel of markers. The results illustrate the potential of plasma biomarker profiling for psychiatric disorders, when conducted in large collections. The study highlighted a set of analytes as candidate biomarker signatures for depression and schizophrenia, warranting further investigation in independent collections
Digging for gold nuggets : uncovering novel candidate genes for variation in gastrointestinal nematode burden in a wild bird species
Acknowledgements This study was funded by a BBSRC studentship (MAWenzel) and NERC grants NE/H00775X/1 and NE/D000602/1 (SB Piertney). The authors are grateful to Marianne James, Mario Roder and Keliya Bai for field-work assistance, Lucy M.I. Webster and Steve Paterson for help during prior development of genetic markers,Heather Ritchie for helpful comments on manuscript drafts and all estate owners, factors and keepers for access to field sites, most particularly MJ Taylor and Mike Nisbet (Airlie), Neil Brown (Allargue), RR Gledson and David Scrimgeour (Delnadamph), Andrew Salvesen and John Hay (Dinnet), Stuart Young and Derek Calder (Edinglassie), Kirsty Donald and DavidBusfield (Glen Dye), Neil Hogbin and Ab Taylor (Glen Muick), Alistair Mitchell (Glenlivet), Simon Blackett, Jim Davidson and Liam Donald (Invercauld), Richard Cooke and Fred Taylor (Invermark), Shaila Rao and Christopher Murphy (Mar Lodge), and Ralph Peters and Philip Astor (Tillypronie)Peer reviewedPostprin
- …