86 research outputs found

    Generation of simulated genotype data at human gene loci in large sample sizes with HAPGEN2.

    No full text
    <p>Haplotypes were simulated at ‘average’ human protein-coding genes drawn from the center of the distribution of RefSeq gene total exon length <b>(A)</b>. Vertical dotted lines in red and green indicate the median and mean values of exon length, respectively. Black bar represents the 24 genes selected for simulation. <b>(B,C)</b> Site frequency spectrum of simulated data, as compared to observed human data. Data were simulated via staged expansion of 1000 Genomes Project haplotypes using the HAPGEN2 software; the mutation parameter was fit to match the site frequency spectrum of protein-coding variation observed in exome sequencing studies, e.g. as reported Nelson et al 2012. Raw simulated data from HAPGEN2 in large sample sizes produced an excess of rare sites; these were down-sampled to match observed data. The grey area in <b>(B)</b> represents the [5%, 95%] interval across all simulated genes, obtained using bootstrapping. The site frequency spectrum of simulated data in a smaller sample size (N = 2.7K) also matched an independent set of observed exome sequencing data from the GoT2D consortium <b>(C)</b>. Haplotype structure, as measured by linkage disequilibrium between variants, was also preserved in the simulated data after sample expansion <b>(D)</b>. The inset shows a representative example of simulations at the GATA3 gene locus.</p

    Power of different gene-based rare variant association methods at simulated disease loci.

    No full text
    <p>At each gene locus, one hundred independent simulations of phenotypic effects were generated in a sample size of 3K individuals (1.5K cases / 1.5K controls). Variant effects were drawn from varied models of genetic architecture (<b>A-F</b>), hypothesizing different degrees of purifying selection against disease alleles (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#sec007" target="_blank">Methods</a>). Under models with strong selection, there is a strong inverse correlation between variant frequency and effect size; under weak selection rare variant effects are less skewed. At all loci, genetic variants together contribute 1% of the phenotypic variance underlying a trait with common prevalence (8%; modeled as type 2 diabetes). Power is measured as the fraction out of 100 simulations of each gene in which a gene-based test reported a p-value lower than the significance threshold. In (<b>A-C</b>), causal variants span the full frequency spectrum (including common alleles), and thus rare alleles account for only a fraction of the locus heritability; in (<b>D-E</b>), all causal variants are rare (MAF<1%). In (<b>F</b>), causal variants have bi-directional effects (some increase risk of disease, while others reduce risk).</p

    Properties of loci at which gene-based methods report discordant results.

    No full text
    <p>Characteristics of causal loci at which KBAC (the method with highest mean power at nominal levels of significance) produces discordant results as compared to another gene-based method. Results are shown above for the simulated architecture AR2 in 3K samples. KBAC is compared to the <b>(A)</b> C-ALPHA, <b>(B)</b> BURDEN, and <b>(C)</b> UNIQ gene-based methods. In each comparison, loci are identified at which KBAC (but not the other method) reports a p-value < 0.01, or at which the other method (but not KBAC) reports a p-value < 0.01. For each group of loci, leftmost vioplot shows the distribution of aggregate case:control counts (number of minor alleles observed in cases divided by number of minor alleles observed in controls, for variants with MAF<1%). Middle vioplot shows distribution of case-unique counts (number of observations of alleles that are only present in cases and absent from controls). Rightmost vioplot shows distribution of the top single variant p-value observed for an exonic variant at the locus (log10 scale). Line plots at right show the distribution of variants (MAF < 1%) at representative simulated loci where the methods are discordant. Each line represents a variant; height above line measures the variant’s case counts, while height below measures control counts. Red lines highlight variants which drive the difference in test performance.</p

    Power of gene-based methods as a function of sample size, locus effect size, and neutral variation.

    No full text
    <p>Power was measured across one hundred simulations at each of 24 gene loci (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>). Across all panels above, variant effects were drawn from the architecture model AR2 (assuming moderate selection against causal variants, and thus modest inverse correlation between variant frequency and effect size). In <b>(A)</b>, variant effects were sampled at each locus such that the total fraction of phenotypic variance explained by the locus was ~0.5%, 1% (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>) or 2%. In <b>(B)</b>, loci were simulated to explain 1% of phenotypic variance in sample sizes of 1.5K cases/1.5K controls (as in Figs <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g003" target="_blank">3</a>) and 5K cases/5K controls. In both <b>(A)</b> and <b>(B)</b>, all exonic variants with MAF < 1% were included in the burden test (both causal and non-causal variants, resulting in a fewer than 50% of all tested variants being causal). In <b>(C)</b>, non-causal (neutral) variants were selectively removed such that the ratio of causal variants to total variants tested ranged from 0.25 to 1 (only causal variants tested). The gene-based methods each have varied performance under different locus effect sizes, sample sizes, and causal variant filtering scenarios.</p

    Power of best-performing gene-based rare variant method as compared to single variant association.

    No full text
    <p>Power is measured across one hundred simulations of phenotypic effects at each of 24 human gene loci in N = 3K samples (as in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.g002" target="_blank">Fig 2</a>). Under each architecture (AR1, AR2, AR3), the power of the best-performing gene-based test at alpha = 2.5e-06 (SKAT-O) is compared to single variant association (Fisher’s exact) at alpha = 5e-08 (panels A, C, E). No MAF threshold was applied to the single variant association tests; gene-based tests included only variants with MAF<1%. Blue boxplot shows range of power for single variant association across genes simulated; pink shows power of the gene-based test alone; green shows the fraction of loci detected only by gene-based test (and not single variant association); yellow shows the combined power of both gene-based and single variant association. Next to each boxplot (panels B, D, F) are scatterplots on which each simulated locus (under AR1, AR2, and AR3, respectively) is represented as a point based on the minor allele frequency (x-axis) and association p-value (y-axis) of the single most-associated variant (the top individual signal) across the locus. Single variant association detects loci plotted above the upper dotted line (at 5e-08), while gene-based association identifies a distinct subset of loci (those highlighted in pink, where the SKAT-O p-value is <2.5e-06). This latter group of loci are those where the top single variant is preferentially rare (and no common variant association signal exists); right-most scatterplots zoom into this portion of the x-axis (MAF<1%). Similar plots for AR4, AR5, and AR6 are shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005165#pgen.1005165.s011" target="_blank">S10 Fig</a>.</p

    Common, low-frequency, and rare genetic variants associated with lipoprotein subclasses and triglyceride measures in Finnish men from the METSIM study

    Get PDF
    <div><p>Lipid and lipoprotein subclasses are associated with metabolic and cardiovascular diseases, yet the genetic contributions to variability in subclass traits are not fully understood. We conducted single-variant and gene-based association tests between 15.1M variants from genome-wide and exome array and imputed genotypes and 72 lipid and lipoprotein traits in 8,372 Finns. After accounting for 885 variants at 157 previously identified lipid loci, we identified five novel signals near established loci at <i>HIF3A</i>, <i>ADAMTS3</i>, <i>PLTP</i>, <i>LCAT</i>, and <i>LIPG</i>. Four of the signals were identified with a low-frequency (0.005LCAT. Gene-based associations (<i>P</i><10<sup>−10</sup>) support a role for coding variants in <i>LIPC</i> and <i>LIPG</i> with lipoprotein subclass traits. 30 established lipid-associated loci had a stronger association for a subclass trait than any conventional trait. These novel association signals provide further insight into the molecular basis of dyslipidemia and the etiology of metabolic disorders.</p></div
    • …
    corecore