Search CORE

86 research outputs found

Generation of simulated genotype data at human gene loci in large sample sizes with HAPGEN2.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

Haplotypes were simulated at ‘average’ human protein-coding genes drawn from the center of the distribution of RefSeq gene total exon length (A). Vertical dotted lines in red and green indicate the median and mean values of exon length, respectively. Black bar represents the 24 genes selected for simulation. (B,C) Site frequency spectrum of simulated data, as compared to observed human data. Data were simulated via staged expansion of 1000 Genomes Project haplotypes using the HAPGEN2 software; the mutation parameter was fit to match the site frequency spectrum of protein-coding variation observed in exome sequencing studies, e.g. as reported Nelson et al 2012. Raw simulated data from HAPGEN2 in large sample sizes produced an excess of rare sites; these were down-sampled to match observed data. The grey area in (B) represents the [5%, 95%] interval across all simulated genes, obtained using bootstrapping. The site frequency spectrum of simulated data in a smaller sample size (N = 2.7K) also matched an independent set of observed exome sequencing data from the GoT2D consortium (C). Haplotype structure, as measured by linkage disequilibrium between variants, was also preserved in the simulated data after sample expansion (D). The inset shows a representative example of simulations at the GATA3 gene locus.</p

FigShare

Power of different gene-based rare variant association methods at simulated disease loci.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

At each gene locus, one hundred independent simulations of phenotypic effects were generated in a sample size of 3K individuals (1.5K cases / 1.5K controls). Variant effects were drawn from varied models of genetic architecture (A-F), hypothesizing different degrees of purifying selection against disease alleles (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#sec007" target="_blank">Methods</a>). Under models with strong selection, there is a strong inverse correlation between variant frequency and effect size; under weak selection rare variant effects are less skewed. At all loci, genetic variants together contribute 1% of the phenotypic variance underlying a trait with common prevalence (8%; modeled as type 2 diabetes). Power is measured as the fraction out of 100 simulations of each gene in which a gene-based test reported a p-value lower than the significance threshold. In (A-C), causal variants span the full frequency spectrum (including common alleles), and thus rare alleles account for only a fraction of the locus heritability; in (D-E), all causal variants are rare (MAF<1%). In (F), causal variants have bi-directional effects (some increase risk of disease, while others reduce risk).</p

FigShare

Properties of loci at which gene-based methods report discordant results.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

Characteristics of causal loci at which KBAC (the method with highest mean power at nominal levels of significance) produces discordant results as compared to another gene-based method. Results are shown above for the simulated architecture AR2 in 3K samples. KBAC is compared to the (A) C-ALPHA, (B) BURDEN, and (C) UNIQ gene-based methods. In each comparison, loci are identified at which KBAC (but not the other method) reports a p-value < 0.01, or at which the other method (but not KBAC) reports a p-value < 0.01. For each group of loci, leftmost vioplot shows the distribution of aggregate case:control counts (number of minor alleles observed in cases divided by number of minor alleles observed in controls, for variants with MAF<1%). Middle vioplot shows distribution of case-unique counts (number of observations of alleles that are only present in cases and absent from controls). Rightmost vioplot shows distribution of the top single variant p-value observed for an exonic variant at the locus (log10 scale). Line plots at right show the distribution of variants (MAF < 1%) at representative simulated loci where the methods are discordant. Each line represents a variant; height above line measures the variant’s case counts, while height below measures control counts. Red lines highlight variants which drive the difference in test performance.</p

FigShare

Published gene-based rare variant association methods evaluated.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

Published gene-based rare variant association methods evaluated.</p

FigShare

Power of gene-based methods as a function of sample size, locus effect size, and neutral variation.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

Power was measured across one hundred simulations at each of 24 gene loci (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>). Across all panels above, variant effects were drawn from the architecture model AR2 (assuming moderate selection against causal variants, and thus modest inverse correlation between variant frequency and effect size). In (A), variant effects were sampled at each locus such that the total fraction of phenotypic variance explained by the locus was ~0.5%, 1% (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>) or 2%. In (B), loci were simulated to explain 1% of phenotypic variance in sample sizes of 1.5K cases/1.5K controls (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>) and 5K cases/5K controls. In both (A) and (B), all exonic variants with MAF < 1% were included in the burden test (both causal and non-causal variants, resulting in a fewer than 50% of all tested variants being causal). In (C), non-causal (neutral) variants were selectively removed such that the ratio of causal variants to total variants tested ranged from 0.25 to 1 (only causal variants tested). The gene-based methods each have varied performance under different locus effect sizes, sample sizes, and causal variant filtering scenarios.</p

FigShare

Power of best-performing gene-based rare variant method as compared to single variant association.

Author: Christian Fuchsberger (76512)
David Altshuler (27754)
Gil McVean (21571)
Jason Flannick (10551)
Kyle J. Gaulton (277252)
Loukas Moutsianas (282004)
Manuel A. Rivas (228371)
Mark I. McCarthy (145748)
Michael Boehnke (145750)
Patrick K. Albers (728931)
Vineeta Agarwala (130598)
Publication venue
Publication date
Field of study

Power is measured across one hundred simulations of phenotypic effects at each of 24 human gene loci in N = 3K samples (as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">Fig 2</a>). Under each architecture (AR1, AR2, AR3), the power of the best-performing gene-based test at alpha = 2.5e-06 (SKAT-O) is compared to single variant association (Fisher’s exact) at alpha = 5e-08 (panels A, C, E). No MAF threshold was applied to the single variant association tests; gene-based tests included only variants with MAF<1%. Blue boxplot shows range of power for single variant association across genes simulated; pink shows power of the gene-based test alone; green shows the fraction of loci detected only by gene-based test (and not single variant association); yellow shows the combined power of both gene-based and single variant association. Next to each boxplot (panels B, D, F) are scatterplots on which each simulated locus (under AR1, AR2, and AR3, respectively) is represented as a point based on the minor allele frequency (x-axis) and association p-value (y-axis) of the single most-associated variant (the top individual signal) across the locus. Single variant association detects loci plotted above the upper dotted line (at 5e-08), while gene-based association identifies a distinct subset of loci (those highlighted in pink, where the SKAT-O p-value is <2.5e-06). This latter group of loci are those where the top single variant is preferentially rare (and no common variant association signal exists); right-most scatterplots zoom into this portion of the x-axis (MAF<1%). Similar plots for AR4, AR5, and AR6 are shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.s011" target="_blank">S10 Fig</a>.</p

FigShare

Additional file 1: Supplemental Information. of A null mutation in ANGPTL8 does not associate with either plasma glucose or type 2 diabetes in humans

Author: Audrey Chu (3503069)
Daniel Rader (3550616)
David Altshuler (27754)
Gina Peloso (257020)
Jason Flannick (10551)
Jennifer Wessel (358112)
Katharine Clapham (3550604)
Manuel Rivas (268944)
Mark McCarthy (157523)
Michael Boehnke (145750)
Pradeep Natarajan (819434)
Robert Scott (698597)
Roxana Mehran (3550607)
Samantha Sartori (3550610)
Sekar Kathiresan (60585)
Usman Baber (3550613)
Valentin Fuster (731738)
Publication venue
Publication date
Field of study

Figure S1. In-silico predictions of the impact of premature termination codon p.Q121X suggests that the variant would trigger nonsensemediated decay. Table S1. ANGPTL8 p.Q121X and type 2 diabetes status by study. (PDF 145Â kb

FigShare

Common, low-frequency, and rare genetic variants associated with lipoprotein subclasses and triglyceride measures in Finnish men from the METSIM study

Author: Adam E. Locke (418984)
Anne U. Jackson (145702)
Antti J. Kangas (144259)
Christian Fuchsberger (76512)
Francis S. Collins (82181)
Heather M. Stringham (198678)
James P. Davis (538246)
Jeroen R. Huyghe (686616)
Johanna Kuusisto (197767)
Karen L. Mohlke (143380)
Markku Laakso (197777)
Michael Boehnke (145750)
Mika Ala-Korpela (74279)
Narisu Narisu (262622)
Pasi Soininen (74273)
Peter S. Chines (145692)
Ryan P. Welch (4543345)
Tanya M. Teslovich (145718)
Xueling Sim (103662)
Publication venue
Publication date: 01/01/2017
Field of study

<div>Lipid and lipoprotein subclasses are associated with metabolic and cardiovascular diseases, yet the genetic contributions to variability in subclass traits are not fully understood. We conducted single-variant and gene-based association tests between 15.1M variants from genome-wide and exome array and imputed genotypes and 72 lipid and lipoprotein traits in 8,372 Finns. After accounting for 885 variants at 157 previously identified lipid loci, we identified five novel signals near established loci at HIF3A, ADAMTS3, PLTP, LCAT, and LIPG. Four of the signals were identified with a low-frequency (0.005LCAT. Gene-based associations (P<10−10) support a role for coding variants in LIPC and LIPG with lipoprotein subclass traits. 30 established lipid-associated loci had a stronger association for a subclass trait than any conventional trait. These novel association signals provide further insight into the molecular basis of dyslipidemia and the etiology of metabolic disorders.</div

Crossref

Directory of Open Access Journals

Carolina Digital Repository

Explore Bristol Research

FigShare

Recommended from our members

Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes

<div>Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.</div

CU Scholar Institutional Repository

Directory of Open Access Journals

FigShare

Newly identified signals associated with lipoprotein subclasses and triglyceride measures.

Author: Adam E. Locke (418984)
Anne U. Jackson (145702)
Antti J. Kangas (144259)
Christian Fuchsberger (76512)
Francis S. Collins (82181)
Heather M. Stringham (198678)
James P. Davis (538246)
Jeroen R. Huyghe (686616)
Johanna Kuusisto (197767)
Karen L. Mohlke (143380)
Markku Laakso (197777)
Michael Boehnke (145750)
Mika Ala-Korpela (74279)
Narisu Narisu (262622)
Pasi Soininen (74273)
Peter S. Chines (145692)
Ryan P. Welch (4543345)
Tanya M. Teslovich (145718)
Xueling Sim (103662)
Publication venue
Publication date
Field of study

Newly identified signals associated with lipoprotein subclasses and triglyceride measures.</p

FigShare