5,156 research outputs found

    Bayesian methods to overcome the winner's curse in genetic studies

    Full text link
    Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bayes method in which a spike-and-slab prior is used to account for the possibility that the significant test result may be due to chance. We examine the robustness of the method using different priors corresponding to different degrees of confidence in the testing results and propose a Bayesian model averaging procedure to combine estimates produced by different models. The Bayesian estimators yield smaller variance compared to the conditional likelihood estimator and outperform the latter in studies with low power. We investigate the performance of the method with simulations and applications to four real data examples.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS373 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A statistical framework for joint eQTL analysis in multiple tissues

    Get PDF
    Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely-adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues or cell types (or, more generally, multiple subgroups). The framework explicitly models the potential for each eQTL to be active in some tissues and inactive in others. By modeling the sharing of active eQTLs among tissues this framework increases power to detect eQTLs that are present in more than one tissue compared with "tissue-by-tissue" analyses that examine each tissue separately. Conversely, by modeling the inactivity of eQTLs in some tissues, the framework allows the proportion of eQTLs shared across different tissues to be formally estimated as parameters of a model, addressing the difficulties of accounting for incomplete power when comparing overlaps of eQTLs identified by tissue-by-tissue analyses. Applying our framework to re-analyze data from transformed B cells, T cells and fibroblasts we find that it substantially increases power compared with tissue-by-tissue analysis, identifying 63% more genes with eQTLs (at FDR=0.05). Further the results suggest that, in contrast to previous analyses of the same data, the majority of eQTLs detectable in these data are shared among all three tissues.Comment: Summitted to PLoS Genetic

    The Population Genetic Signature of Polygenic Local Adaptation

    Full text link
    Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We first describe a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of QST/FSTQ_{ST}/F_{ST} comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and tables not included on this upload, but are mostly unchanged from v

    Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies

    Full text link
    Motivated by examples from genetic association studies, this paper considers the model selection problem in a general complex linear model system and in a Bayesian framework. We discuss formulating model selection problems and incorporating context-dependent {\it a priori} information through different levels of prior specifications. We also derive analytic Bayes factors and their approximations to facilitate model selection and discuss their theoretical and computational properties. We demonstrate our Bayesian approach based on an implemented Markov Chain Monte Carlo (MCMC) algorithm in simulations and a real data application of mapping tissue-specific eQTLs. Our novel results on Bayes factors provide a general framework to perform efficient model comparisons in complex linear model systems
    • …
    corecore