Search CORE

MDC Repository

Analysis of case-control association studies with known risk variants

Author: Altshuler David
Dermitzakis Emmanouil T.
Groop Leif
Haiman Christopher A.
Henderson Brian E.
Kolonel Laurence N.
Kraft Peter
Marchand Loic Le
Patterson Nick
Paşaniuc Bogdan
Pollack Samuela
Price Alkes L.
Stranger Barbara E.
Voight Benjamin
Waters Kevin
Zaitlen Noah
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/ Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

RERO DOC Digital Library

ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues.

Author: Burchard E.
Chen L.
Christenson S.
Daley T.
Eng C.
Eskin E.
Gruhl F.
Hernandez R.D.
Hsieh K.
Levanon E.Y.
Mangul S.
Ophoff R.A.
Porath H.T.
Rios C.
Santana J.R.
Seibold M.A.
Shifman S.
Smith A.D.
Spreafico R.
Strauli N.
Wesolowska-Andersen A.
Woodruff P.G.
Yang H.T.
Zaitlen N.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki

Serveur académique lausannois

eScholarship - University of California

Regional variation in health is predominantly driven by lifestyle rather than genetics

Author: A Jemal
AD Lopez
BA Swinburn
BH Smith
C Amador
C Chang
C Willyard
C Xia
G Davey Smith
J Yang
J Yang
M Ezzati
M Marmot
MR Robinson
N Zaitlen
NY Krakauer
PM VanRaden
S Vattikuti
The 1000 Genomes Project Consortium
YC Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2017
Field of study

Health-related traits are known to vary geographically. Here, Amador and colleagues show that regional variation of obesity-related traits in a Scottish population is influenced more by lifestyle differences than it is by genetic differences

Edinburgh Research Explorer

University of Dundee Online Publications

Quantifying Missing Heritability at Known GWAS Loci

Author: A Franke
AL Williams
Alexander Gusev
Alkes L. Price
AR Gilmour
B Howie
B Maher
Bjarni J. Vilhjalmsson
Bogdan Pasaniuc
C Cotsapas
C Richard-Miceli
C Spencer
D Diogo
D Ellinghaus
D Luca
D Speed
DG Clayton
Dorothée Diogo
DS Falconer
EA Stahl
EA Stahl
EE Eichler
Eli A. Stahl
F Zou
G Galarneau
G Trynka
Gaurav Bhatia
GB Ehret
H Lango Allen
HC So
HD Patterson
I Dunham
I Ionita-Laza
J Listgarten
J Shea
J Yang
J Yang
J Yang
J Yang
J Yang
Jane Worthington
JB Maller
JH Park
JW Smoller
KA Hunt
L Jostins
Lars Klareskog
MA Rivas
N Chatterjee
N Patterson
N Solovieff
N Zaitlen
NA Zaitlen
Noah Zaitlen
Peter K. Gregersen
Peter M. Visscher
PM Visscher
PM Visscher
PS Ramos
RF Robinson
RM Plenge
Robert M. Plenge
S Eyre
S Lee
S Sanna
S Vattikuti
SH Lee
Soumya Raychaudhuri
SP Dickson
T Fischer
TA Manolio
TGP Consortium
TM Teslovich
X Ke
X Zhou
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain 1.29 X more heritability than GWAS-associated SNPs on average (P = 3.3 X 10[superscript -5]). For some diseases, this increase was individually significant:2.07 X for Multiple Sclerosis (MS) (P = 6.5 X 10 [superscript -9]) and for Crohn's Disease (CD) (P = 1.3 X 10[superscript -3]); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained 7.15 X more MS heritability than known MS SNPs (P 20,000 Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with 2.37 X more heritability from all SNPs at GWAS loci (P = 2.3 X 10[superscript -6]) and more heritability from all autoimmune disease loci (P < 1 X 10[superscript -16]) compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.National Institutes of Health (U.S.) (Grant R03HG006731)National Institutes of Health (U.S.) (Fellowship F32GM106584

DSpace@MIT

Harvard University - DASH

The University of Manchester - Institutional Repository

FigShare

Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

Author: A Genz
A Genz
B Devlin
B Devlin
B Han
BL Browning
Buhm Han
D Altshuler
DA Williams
DJ Schaid
DL Nicolae
DR Nyholt
DY Lin
E Eskin
E Jorgenson
Eleazar Eskin
F Dudbridge
F Dudbridge
F Yates
FS Collins
G Kimmel
GU Yule
Hyun Min Kang
I Pe'er
J Li
J Marchini
JD Storey
JK Pritchard
JM Cheverud
John D. Storey
KN Conneely
LA Wasserman
N Risch
N Zaitlen
NA Zaitlen
P de Bakker
PD Sasieni
PH Westfall
RJ Klein
S Purcell
SR Browning
SR Seaman
TA Louis
TR Bhangale
V Hajivassiliou
V Moskvina
Y Benjamini
Publication venue: Public Library of Science
Publication date: 01/04/2009
Field of study

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu

Public Library of Science (PLOS)

SNU Open Repository and Archive

eScholarship - University of California

Genetics of callous-unemotional behavior in children

Author: A Meyer-Lindenberg
AA Marsh
AAE Vinkhuyzen
AI Malik
AP Jones
B Devlin
B Maher
BL van der Waerden
BN Howie
BR Oliver
CF Chabris
CL Sebastian
Claire M. A. Haworth
Consortium Wellcome Trust Case Control
DR Lynam
E Viding
E Viding
E Viding
E Viding
ED Barker
Emma L. Meaburn
Essi Viding
F Dudbridge
G Gibson
GH Lubke
H Larsson
Huiping Zhang
J Fellay
J Flint
J Marchini
J Yang
J Yang
JA Yang
JC Barrett
JH Beitchman
JH Park
JN Hirschhorn
KCM Siontis
M Forsman
M Forsman
Maciej Trzaskowski
MI McCarthy
MR Munafo
N Sadeh
N Zaitlen
NMG Fontaine
NMG Fontaine
Oliver S. P. Davis
PJ Frick
PJ Frick
PM Visscher
R Gomez
R Goodman
R Kumsta
R Plomin
R Rowe
RJ Klein
RJ Pruim
Robert Plomin
S Bezdjian
S Boker
S Purcell
Sara R. Jaffee
SH Lee
SH Lee
SM Purcell
SR Browning
T Fowler
Thomas S. Price
TS Nadder
WG Hill
Y Kovas
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Callous-unemotional behavior (CU) is currently under consideration as a subtyping index for conduct disorder diagnosis. Twin studies routinely estimate the heritability of CU as greater than 50%. It is now possible to estimate genetic influence using DNA alone from samples of unrelated individuals, not relying on the assumptions of the twin method. Here we use this new DNA method (implemented in a software package called Genome-wide Complex Trait Analysis, GCTA) for the first time to estimate genetic influence on CU. We also report the first genome-wide association (GWA) study of CU as a quantitative trait. We compare these DNA results to those from twin analyses using the same measure and the same community sample of 2,930 children rated by their teachers at ages 7, 9 and 12. GCTA estimates of heritability were near zero, even though twin analysis of CU in this sample confirmed the high heritability of CU reported in the literature, and even though GCTA estimates of heritability were substantial for cognitive and anthropological traits in this sample. No significant associations were found in GWA analysis, which, like GCTA, only detects additive effects of common DNA variants. The phrase ‘missing heritability’ was coined to refer to the gap between variance associated with DNA variants identified in GWA studies versus twin study heritability. However, GCTA heritability, not twin study heritability, is the ceiling for GWA studies because both GCTA and GWA are limited to the overall additive effects of common DNA variants, whereas twin studies are not. This GCTA ceiling is very low for CU in our study, despite its high twin study heritability estimate. The gap between GCTA and twin study heritabilities will make it challenging to identify genes responsible for the heritability of CU

Public Library of Science (PLOS)

UCL Discovery

Birkbeck Institutional Research Online

Warwick Research Archives Portal Repository

King's Research Portal

Explore Bristol Research

University of Queensland eSpace

Multiethnic Genetic Association Studies Improve Power for Locus Discovery

Author: A Keinan
AD Skol
AG Clark
AL Price
Benjamin F. Voight
DV Zaykin
E Zeggini
H Unoki
I Pe'er
JN Hirschhorn
K Yasuda
LJ Scott
MI McCarthy
Michael Nicholas Weedon
N Zaitlen
NA Rosenberg
Paul I. W. de Bakker
PI Lin
R Saxena
R Sladek
Sara L. Pulit
TA Manolio
V Steinthorsdottir
YY Teo
Publication venue: Public Library of Science
Publication date: 01/09/2010
Field of study

To date, genome-wide association studies have focused almost exclusively on populations of European ancestry. These studies continue with the advent of next-generation sequencing, designed to systematically catalog and test low-frequency variation for a role in disease. A complementary approach would be to focus further efforts on cohorts of multiple ethnicities. This leverages the idea that population genetic drift may have elevated some variants to higher allele frequency in different populations, boosting statistical power to detect an association. Based on empirical allele frequency distributions from eleven populations represented in HapMap Phase 3 and the 1000 Genomes Project, we simulate a range of genetic models to quantify the power of association studies in multiple ethnicities relative to studies that exclusively focus on samples of European ancestry. In each of these simulations, a first phase of GWAS in exclusively European samples is followed by a second GWAS phase in any of the other populations (including a multiethnic design). We find that nontrivial power gains can be achieved by conducting future whole-genome studies in worldwide populations, where, in particular, African populations contribute the largest relative power gains for low-frequency alleles (<5%) of moderate effect that suffer from low power in samples of European descent. Our results emphasize the importance of broadening genetic studies to worldwide populations to ensure efficient discovery of genetic loci contributing to phenotypic trait variability, especially for those traits for which large numbers of samples of European ancestry have already been collected and tested

Public Library of Science (PLOS)

Harvard University - DASH

Genomic selection in commercial perennial crops: applicability and improvement in oil palm (Elaeis guineensis Jacq.)

Author: BJ Hayes
BJ Hayes
CC Li
CK Teh
CK Wong
D Cros
D Habier
FM Bassi
G Los Campos de
G Moser
H Muranty
J Crossa
J Pew
J Spindel
J Yang
JB Endelman
JE Spindel
MF Resende Jr.
MF Resende Jr.
MP Calus
N Zaitlen
P Perez
R Singh
R Singh
S Purcell
SA Clark
T Park
TH Meuwissen
V Rao
WG Hill
ZA Desta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Genomic selection (GS) uses genome-wide markers to select individuals with the desired overall combination of breeding traits. A total of 1,218 individuals from a commercial population of Ulu Remis x AVROS (UR x AVROS) were genotyped using the OP200K array. The traits of interest included: shellto- fruit ratio (S/F, %), mesocarp-to-fruit ratio (M/F, %), kernel-to-fruit ratio (K/F, %), fruit per bunch (F/B, %), oil per bunch (O/B, %) and oil per palm (O/P, kg/palm/year). Genomic heritabilities of these traits were estimated to be in the range of 0.40 to 0.80. GS methods assessed were RR-BLUP, Bayes A (BA), Cπ (BC), Lasso (BL) and Ridge Regression (BRR). All methods resulted in almost equal prediction accuracy. The accuracy achieved ranged from 0.40 to 0.70, correlating with the heritability of traits. By selecting the most important markers, RR-BLUP B has the potential to outperform other methods. The marker density for certain traits can be further reduced based on the linkage disequilibrium (LD). Together with in silico breeding, GS is now being used in oil palm breeding programs to hasten parental palm selection

Nottingham ePrints

Nottingham eTheses

Repository@Nottingham

UM Digital Repository

ScholarBank@NUS

Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies

Author: Aage Haugen
AL Price
AL Price
Albert Rosenberger
Alkes L. Price
Angela Risch
Ann W. Morgan
Anne Barton
Anthony G. Wilson
Barry I. Freedman
Benjamin Voight
BF Voight
Bogdan Pasaniuc
Brian E. Henderson
C Wallace
Carl D. Langefeld
Christopher Haiman
CI Amos
CL Kuo
D Campa
D Clayton
D Cox
D Thomas
DA Schaumberg
Daniel I. Chasman
David Altshuler
David C. Christiani
David J. Friedman
David J. Hunter
David Scherf
Debra A. Schaumberg
DJ Hunter
Donald W. Bowden
DS Falconer
Eric Tchetgen Tchetgen
ESBD Lander
G Genovese
G Jin
G Maskarinec
Giulio Genovese
GM Monsees
GV Kryukov
H Holm
HC So
Heike Bickeböller
J Dong
J Marchini
Jane Worthington
JK Field
JM Neuhaus
Joachim Heinrich
John K. Field
JR Perry
JRB Perry
Kevin M. Waters
KL Ellis
KM Waters
Laurence N. Kolonel
LD Robinson
Leif Groop
Loic Le Marchand
LT Guey
Lynne J. Hocking
M Imielinski
M Pirinen
Maria Teresa Landi
Marilyn Cornelis
Martin Walshaw
Michael Meister
ML Freedman
N Chatterjee
N Risch
N Zaitlen
N Zaitlen
Nick Patterson
NJ Risch
NJ Wald
Noah Zaitlen
NR Wray
Olaide Y. Raji
P Armitage
P Kraft
P Sulem
Pamela J. Hicks
Paul Wordsworth
Peter Kraft
Peter M. Visscher
PM Ridker
Robert M. Plenge
S Kathiresan
S Lindstrom
S Raychaudhuri
S Rose
S Van Gestel
S Zienolddiny
Samuela Pollack
Sara Lindström
SH Lee
Shanbeh Zienolddiny
SJ Chanock
Sophia Steer
Steve Eyre
T Lumley
TH Hamza
TJ Vanderweele
TM Frayling
W Thomson
WG Hill
WW Piegorsch
Z Kote-Jarai
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci

The University of Manchester - Institutional Repository

PuSH

White Rose Research Online

FigShare

University of Queensland eSpace

Lund University Publications

Harvard University - DASH