18 research outputs found

    A comparative analysis of exome capture

    Get PDF
    ABSTRACT: BACKGROUND: Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. RESULTS: Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. CONCLUSIONS: Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products

    Establishing the baseline level of repetitive element expression in the human cortex

    Get PDF
    Background: Although nearly half of the human genome is comprised of repetitive sequences, the expression profile of these elements remains largely uncharacterized. Recently developed high throughput sequencing technologies provide us with a powerful new set of tools to study repeat elements. Hence, we performed whole transcriptome sequencing to investigate the expression of repetitive elements in human frontal cortex using postmortem tissue obtained from the Stanley Medical Research Institute. Results: We found a significant amount of reads from the human frontal cortex originate from repeat elements. We also noticed that Alu elements were expressed at levels higher than expected by random or background transcription. In contrast, L1 elements were expressed at lower than expected amounts. Conclusions: Repetitive elements are expressed abundantly in the human brain. This expression pattern appears to be element specific and can not be explained by random or background transcription. These results demonstrate that our knowledge about repetitive elements is far from complete. Further characterization is required to determine the mechanism, the control, and the effects of repeat element expression

    Eight disease etiologies used in simulation experiments.

    No full text
    <p><i>Rare variant</i> = disease caused by multiple rare deleterious variants. <i>Low frequency variant</i> = disease caused by multiple low frequency deleterious variants. <i>Key Region variant</i> = rare deleterious variants are localized to key regions. <i>Common variant</i> = disease caused by a single deleterious common variant. The etiologies <i>Rare+Protect</i>, <i>LowFreq+Protect</i>, <i>KeyRegion+Protect</i> and <i>Common+Protect</i> were identical to the first four except that they include protective variants.</p>1<p>Minor allele frequency of deleterious causal variants,</p>2<p>Selection coefficients of deleterious causal variants,</p>3<p>Effect size of deleterious causal variants,</p>4<p>Selection coefficient of protective causal variants,</p>5<p>Effect size of protective modifier variants,</p>6<p>Required functional role of causal and protective variants, NS = coding non-synonymous, AA = African-American simple bottleneck demographic model <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Boyko1" target="_blank">[44]</a>, EA = European-American exponential growth demographic model <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Kryukov1" target="_blank">[19]</a>).</p>*<p> for protective modifier variants with AF5%, for protective modifier variants with AF5%.</p

    BOMP P-values for gene sets in Bipolar case-control study.

    No full text
    <p>The gene sets were selected for testing because they contained genes and were the most significantly enriched by synaptic genes <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Pirooznia1" target="_blank">[26]</a>. Seven of the genes sets were nominally associated with bipolar disorder (P0.05) and have FDR0.1.</p>*<p>FDR computed with the Benjamini-Hochberg algorithm <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224-Benjamini1" target="_blank">[45]</a>.</p>**<p>Wall-clock time in minutes.</p

    BOMP burden and position statistics complement each other.

    No full text
    <p>Breakdown of contribution of BOMP mutation burden (BOMP_B) and BOMP position distribution (BOMP_P) statistics averaged over single candidate gene power estimates (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen-1003224-g001" target="_blank">Figure 1</a>) and multiple candidate gene power estimates (nine genes, 3 with causal variants and 6 with no causal variants) (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen-1003224-g003" target="_blank">Figure 3</a>) for case-control study sizes of 200, 1000, 2000, and 5000. Combining the two statistics consistently yielded improved power with respect to each statistic on its own. The BOMP burden statistic had more power than BOMP position for the simulations based on a single candidate gene, and vice versa in the simulations with nine candidate genes and 3∜6 causal to non-causal ratio.</p

    Power estimates for multiple gene case-control studies with causal variants equally likely to be from any disease etiology dominated by rare variants.

    No full text
    <p>A,B. X-axis shows number of candidate genes in 250 simulated case-control studies (approximately one-third each from disease etiologies Rare, LowFreq and KeyRegion). All genes contain causal variants. For each method, average power is shown. Power increases for all methods as the number of candidate genes with causal variants increases. C,D. X-axis shows the number of candidate genes and the ratio of genes containing causal variants to those that do not contain causal variants. As the ratio decreases, the power of the tested methods also decreases. (Tested methods are BOMP, VT, SKAT and KBAC1P = minor allele frequency defined as , KBAC5P = minor allele frequency defined as ). AA = the case-control studies were drawn from gene populations generated with an African-American simple bottleneck demographic model. EA = the case-control studies were drawn from gene populations generated with a European-American exponential growth demographic model.)</p

    Analytical comparison of SKAT, BOMP, and VT on a toy example.

    No full text
    <p>Genotypes of 8 cases and 8 controls at 10 positions. Matrix column colors: controls = light blue, cases = light red. Position distribution bar colors: controls = blue, cases = red. Detailed description is in the section “Toy example with analytical calculations” (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224.s013" target="_blank">Text S1</a>).</p

    Power estimates for multiple genes case-control studies with causal variants from disease etiologies randomly sampled from nine multinomial distributions (Figure S3).

    No full text
    <p>Power estimates for BOMP, VT, SKAT, KBAC (KBAC1P = minor allele frequency defined as , KBAC5P = minor allele frequency defined as ). Each vertical line represents power estimates for each method, based on 250 simulated case-control studies. The genomic individuals each had nine genes, of which three contained causal variants and six did not. The disease etiologies for the three genes with causal variants were randomly sampled from nine multinomial distributions (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003224#pgen.1003224.s003" target="_blank">Figure S3</a>). AA = African-American simple bottleneck demographic model. EA = European-American exponential growth demographic model.</p
    corecore