Search CORE

8,707 research outputs found

Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing

Author: Wen Xiaoquan
Publication venue
Publication date: 23/02/2015
Field of study

We consider the problems of hypothesis testing and model comparison under a flexible Bayesian linear regression model whose formulation is closely connected with the linear mixed effect model and the parametric models for SNP set analysis in genetic association studies. We derive a class of analytic approximate Bayes factors and illustrate their connections with a variety of frequentist test statistics, including the Wald statistic and the variance component score statistic. Taking advantage of Bayesian model averaging and hierarchical modeling, we demonstrate some distinct advantages and flexibilities in the approaches utilizing the derived Bayes factors in the context of genetic association studies. We demonstrate our proposed methods using real or simulated numerical examples in applications of single SNP association testing, multi-locus fine-mapping and SNP set association testing

arXiv.org e-Print Archive

CiteSeerX

Effective Genetic Risk Prediction Using Mixed Models

Author: Golan David
Rosset Saharon
Publication venue
Publication date: 01/01/2014
Field of study

To date, efforts to produce high-quality polygenic risk scores from genome-wide studies of common disease have focused on estimating and aggregating the effects of multiple SNPs. Here we propose a novel statistical approach for genetic risk prediction, based on random and mixed effects models. Our approach (termed GeRSI) circumvents the need to estimate the effect sizes of numerous SNPs by treating these effects as random, producing predictions which are consistently superior to current state of the art, as we demonstrate in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC study, we confirm that the use of random effects is most beneficial for diseases that are known to be highly polygenic: hypertension (HT) and bipolar disorder (BD). For HT, there are no significant associations in the WTCCC data. The best existing model yields an AUC of 54%, while GeRSI improves it to 59%. For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at the top 10% of BD risk predictions, using GeRSI substantially increases the BD relative risk from 1.4 to 2.5.Comment: main text: 14 pages, 3 figures. Supplementary text: 16 pages, 21 figure

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Replication in Genome-Wide Association Studies

Author: Ioannidis John P. A.
Kraft Peter
Zeggini Eleftheria
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/10/2010
Field of study

Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication, issues of heterogeneity, advantages and disadvantages of different methods of data synthesis across multiple studies, frequentist vs. Bayesian inferences for replication, and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Methodological Issues in Multistage Genome-Wide Association Studies

Author: Casey Graham
Conti David V.
Haile Robert W.
Lewinger Juan Pablo
Stram Daniel O.
Thomas Duncan C.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of ``promising'' SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a ``replication'' panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent ``exact replication'' study is needed in a similar population of the same promising SNPs using similar methods.Comment: Published in at http://dx.doi.org/10.1214/09-STS288 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Author: Neil Lawrence
Nicolo Fusi
Oliver Stegle
Publication venue
Publication date: 02/06/2011
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. 

Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an
eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. 

We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies

Nature Precedings

Recommended from our members

Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.

Author: Afaq Saima
Afzal Shoaib
Ahlqvist Emma
Almgren Peter
Amin Najaf
An Ping
Bang Lia B
Bertoni Alain G
Bielak Lawrence F
Bombieri Cristina
Bork-Jensen Jette
Brandslund Ivan
Brody Jennifer A
Burtt Noël P
Canouil Mickaël
Chen Yii-Der Ida
Cho Yoon Shin
Christensen Cramer
Chu Audrey Y
Cook James P
de Haan Hugoline G
Demirkan Ayse
Eastwood Sophie V
Eckardt Kai-Uwe
ExomeBP Consortium
Fischer Krista
Flannick Jason
Gambaro Giovanni
Gan Wei
GIANT Consortium
Giedraitis Vilmantas
Graff Marielisa
Grarup Niels
Grove Megan L
Guo Xiuqing
Gustafsson Stefan
Hackinger Sophie
Hai Yang
Han Sohee
Highland Heather M
Hivert Marie-France
Hu Yao
Huo Shaofeng
Isomaa Bo
Jensen Richard A
Justice Anne E
Jäger Susanne
Jørgensen Marit E
Jørgensen Torben
Kim Bong-Jo
Kim Sung Soo
Kim Young Jin
Kitajima Hidetoshi
Koistinen Heikki A
Kovacs Peter
Kravic Jasmina
Kriebel Jennifer
Kronenberg Florian
Käräjämäki Annemari
Lange Leslie A
Lecoeur Cécile
Lee Jung-Jin
Lehne Benjamin
Li Huaixing
Li Jin
Li Man
Li-Gao Ruifang
Ligthart Symen
Lin Keng-Hung
Liu Dajiang J
Lohman Kurt K
Lu Yingchang
Läll Kristi
MAGIC Consortium
Mahajan Anubha
Malerba Giovanni
Marouli Eirini
Marten Jonathan
Meidtner Karina
Müller-Nurasyid Martina
Peloso Gina Marie
Preuss Michael
Prins Bram Peter
Rayner N William
Robertson Neil R
Rybin Denis V
Smith Albert Vernon
Steinthorsdottir Valgerdur
Tajes Juan Fernandez
Taliun Daniel
Trubetskoy Vassily Vladimirovich
Tybjærg-Hansen Anne
Varga Tibor V
Warren Helen R
Wessel Jennifer
Willems Sara M
Wuttke Matthias
Yaghootkar Hanieh
Zhang Weihua
Zhao Wei
Publication venue: eScholarship, University of California
Publication date: 01/04/2018
Field of study

We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10-7); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition

eScholarship - University of California