10 research outputs found
Power estimates for multiple genes case-control studies with causal variants from disease etiologies randomly sampled from nine multinomial distributions (Figure S3).
<p>Power estimates for BOMP, VT, SKAT, KBAC (KBAC1Pâ=âminor allele frequency defined as , KBAC5Pâ=âminor allele frequency defined as ). Each vertical line represents power estimates for each method, based on 250 simulated case-control studies. The genomic individuals each had nine genes, of which three contained causal variants and six did not. The disease etiologies for the three genes with causal variants were randomly sampled from nine multinomial distributions (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224.s003" target="_blank">Figure S3</a>). AAâ=âAfrican-American simple bottleneck demographic model. EAâ=âEuropean-American exponential growth demographic model.</p
Eight disease etiologies used in simulation experiments.
<p><i>Rare variant</i>â=âdisease caused by multiple rare deleterious variants. <i>Low frequency variant</i>â=âdisease caused by multiple low frequency deleterious variants. <i>Key Region variant</i>â=ârare deleterious variants are localized to key regions. <i>Common variant</i>â=âdisease caused by a single deleterious common variant. The etiologies <i>Rare+Protect</i>, <i>LowFreq+Protect</i>, <i>KeyRegion+Protect</i> and <i>Common+Protect</i> were identical to the first four except that they include protective variants.</p>1<p>Minor allele frequency of deleterious causal variants,</p>2<p>Selection coefficients of deleterious causal variants,</p>3<p>Effect size of deleterious causal variants,</p>4<p>Selection coefficient of protective causal variants,</p>5<p>Effect size of protective modifier variants,</p>6<p>Required functional role of causal and protective variants, NSâ=âcoding non-synonymous, AAâ=âAfrican-American simple bottleneck demographic model <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Boyko1" target="_blank">[44]</a>, EAâ=âEuropean-American exponential growth demographic model <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Kryukov1" target="_blank">[19]</a>).</p>*<p> for protective modifier variants with AF5%, for protective modifier variants with AF5%.</p
Power estimates for multiple gene case-control studies with causal variants equally likely to be from any disease etiology dominated by rare variants.
<p>A,B. X-axis shows number of candidate genes in 250 simulated case-control studies (approximately one-third each from disease etiologies Rare, LowFreq and KeyRegion). All genes contain causal variants. For each method, average power is shown. Power increases for all methods as the number of candidate genes with causal variants increases. C,D. X-axis shows the number of candidate genes and the ratio of genes containing causal variants to those that do not contain causal variants. As the ratio decreases, the power of the tested methods also decreases. (Tested methods are BOMP, VT, SKAT and KBAC1Pâ=âminor allele frequency defined as , KBAC5Pâ=âminor allele frequency defined as ). AAâ=âthe case-control studies were drawn from gene populations generated with an African-American simple bottleneck demographic model. EAâ=âthe case-control studies were drawn from gene populations generated with a European-American exponential growth demographic model.)</p
Analytical comparison of SKAT, BOMP, and VT on a toy example.
<p>Genotypes of 8 cases and 8 controls at 10 positions. Matrix column colors: controlsâ=âlight blue, casesâ=âlight red. Position distribution bar colors: controlsâ=âblue, casesâ=âred. Detailed description is in the section âToy example with analytical calculationsâ (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224.s013" target="_blank">Text S1</a>).</p
Single gene methods power comparison.
<p>Power estimates for BOMP, VT, SKAT, KBAC (KBAC1Pâ=âminor allele frequency defined as , KBAC5Pâ=âminor allele frequency defined as ). Each vertical line represents power estimates for each method, based on 250 simulated case-control studies. AAâ=âthe case-control studies were drawn from gene populations generated with an African-American simple bottleneck demographic model. EAâ=âthe case-control studies were drawn from gene populations generated with a European-American exponential growth demographic model. The eight variant causality (disease etiology) models are defined in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen-1003224-t001" target="_blank">Table 1</a>. Since the European-American demographic model does not account for common or protective variants, etiologies involving common or protective variants were only considered for the African-American demographic model.</p
Dallas Heart Study.
<p>P-values of association between dichotomized triglyceride levels and variation in three ANGPTL family genes sequenced in Dallas Heart Study. ANGPTL - multiple gene set including <i>ANGPTL3</i>, <i>ANGPTL4</i>, and <i>ANGPTL5</i>. The most significant P-value for each is highlighted in bold. BOMPâ=âcombined Burden and Position statistics VTâ=âvariable threshold burden test <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Price1" target="_blank">[10]</a> SKATâ=âsequence kernel association test (linear weighting version) <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Wu1" target="_blank">[12]</a>, KBACâ=âKernel-based adaptive cluster <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Liu2" target="_blank">[20]</a> (1Dâ=âsingle direction, 2Dâ=âtwo direction, 1Pâ=ârare variants defined as MAF, 5Pâ=ârare variants defined as MAF). VESTâ=âBOMP and VT with VEST score variant weighting.</p
BOMP P-values for gene sets in Bipolar case-control study.
<p>The gene sets were selected for testing because they contained genes and were the most significantly enriched by synaptic genes <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Pirooznia1" target="_blank">[26]</a>. Seven of the genes sets were nominally associated with bipolar disorder (P0.05) and have FDR0.1.</p>*<p>FDR computed with the Benjamini-Hochberg algorithm <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Benjamini1" target="_blank">[45]</a>.</p>**<p>Wall-clock time in minutes.</p
BOMP burden and position statistics complement each other.
<p>Breakdown of contribution of BOMP mutation burden (BOMP_B) and BOMP position distribution (BOMP_P) statistics averaged over single candidate gene power estimates (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen-1003224-g001" target="_blank">Figure 1</a>) and multiple candidate gene power estimates (nine genes, 3 with causal variants and 6 with no causal variants) (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen-1003224-g003" target="_blank">Figure 3</a>) for case-control study sizes of 200, 1000, 2000, and 5000. Combining the two statistics consistently yielded improved power with respect to each statistic on its own. The BOMP burden statistic had more power than BOMP position for the simulations based on a single candidate gene, and vice versa in the simulations with nine candidate genes and 3â¶6 causal to non-causal ratio.</p
Components of BOMP Hybrid Likelihood Model compared.
<p>A. Mutation burden statistic. The Mutation burden statistic uses the aggregated burden for cases, , and controls . B. Mutation position distribution statistic. Aggregated window mutation counts are calculated for cases, , controls, , and cases and controls combined, , across windows.</p
Example variation pattern in which position distribution outperforms burden tests.
<p>A toy example of a genomic region containing variants (blue squares) in cases and controls. We assume that the region is important for phenotype. Variant counts in cases (red). Variant counts in controls (purple). Cases and controls each have a total of 9 variants in this region, so Burden statistics (<i>e.g.</i>, VT or BOMP burden) will not be able to detect that the region is important for phenotype. BOMP's position distribution statistic collapses variants into short, localized windows (red dashed lines) and detects that the number of variants seen in cases and controls is different within the windows. We note that a method that does not collapse variants, such as SKAT, does not have much power to detect the difference between cases and controls, because at each position the number of variants in cases and controls is similar.</p