17 research outputs found

    DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number.</p> <p>Results</p> <p>We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual.</p> <p>Conclusion</p> <p>The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.</p

    Digital Genome-Wide ncRNA Expression, Including SnoRNAs, across 11 Human Tissues Using PolyA-Neutral Amplification

    Get PDF
    Non-coding RNAs (ncRNAs) are an essential class of molecular species that have been difficult to monitor on high throughput platforms due to frequent lack of polyadenylation. Using a polyadenylation-neutral amplification protocol and next-generation sequencing, we explore ncRNA expression in eleven human tissues. ncRNAs 7SL, U2, 7SK, and HBII-52 are expressed at levels far exceeding mRNAs. C/D and H/ACA box snoRNAs are associated with rRNA methylation and pseudouridylation, respectively: spleen expresses both, hypothalamus expresses mainly C/D box snoRNAs, and testes show enriched expression of both H/ACA box snoRNAs and RNA telomerase TERC. Within the snoRNA 14q cluster, 14q(I-6) is expressed at much higher levels than other cluster members. More reads align to mitochondrial than nuclear tRNAs. Many lincRNAs are actively transcribed, particularly those overlapping known ncRNAs. Within the Prader-Willi syndrome loci, the snoRNA HBII-85 (group I) cluster is highly expressed in hypothalamus, greater than in other tissues and greater than group II or III. Additionally, within the disease locus we find novel transcription across a 400,000 nt span in ovaries. This genome-wide polyA-neutral expression compendium demonstrates the richness of ncRNA expression, their high expression patterns, their function-specific expression patterns, and is publicly available

    Quality Score Based Identification and Correction of Pyrosequencing Errors

    Get PDF
    <div><p>Massively-parallel DNA sequencing using the 454/pyrosequencing platform allows in-depth probing of diverse sequence populations, such as within an HIV-1 infected individual. Analysis of this sequence data, however, remains challenging due to the shorter read lengths relative to that obtained by Sanger sequencing as well as errors introduced during DNA template amplification and during pyrosequencing. The ability to distinguish real variation from pyrosequencing errors with high sensitivity and specificity is crucial to interpreting sequence data. We introduce a new algorithm, CorQ (<u>Cor</u>rection through <u>Q</u>uality), which utilizes the inherent base quality in a sequence-specific context to correct for homopolymer and non-homopolymer insertion and deletion (indel) errors. CorQ also takes uneven read mapping into account for correcting pyrosequencing miscall errors and it identifies and corrects carry forward errors. We tested the ability of CorQ to correctly call SNPs on a set of pyrosequences derived from ten viral genomes from an HIV-1 infected individual, as well as on six simulated pyrosequencing datasets generated using non-zero error rates to emulate errors introduced by PCR. When combined with the AmpliconNoise error correction method developed to remove ambiguities in signal intensities, we attained a 97% reduction in indel errors, a 98% reduction in carry forward errors, and >97% specificity of SNP detection. When compared to four other error correction methods, AmpliconNoise+CorQ performed at equal or higher SNP identification specificity, but the sensitivity of SNP detection was consistently higher (>98%) than other methods tested. This combined procedure will therefore permit examination of complex genetic populations with improved accuracy.</p></div

    Sensitivity and specificity of error correction algorithms in SNP variant calling in simulated pyrosequences (simulated datasets 2a–c).

    No full text
    <p>Comparison of CorQ algorithm against other pyrosequence error correction and SNP calling algorithms. Simulated pyrosequences generated from 28 HIV-1 sequences as the starting template were used to compare the sensitivity and specificity of error correction algorithms. Sensitivity measures the proportion of true SNPs present within the HIV-1 templates used for simulation, and correctly identified as such by the various SNP calling programs. Specificity measures the proportion of true negatives (positions in the gene regions that are invariant) that are correctly identified as such by the compared programs.</p>*<p>Values from QuRe are shown when the poor coverage regions excluded from sensitivity analysis are included as false negatives (shown in parenthesis).</p

    Attrition in indel counts after application of error correction methods.

    No full text
    <p>The percent reduction in number of indels within the HIV-1 ten-plasmid dataset compared to uncorrected sequences is presented. While Pyrobayes is not an error correction algorithm, but rather recalibrates quality values, the accuracy of recalibrated bases are meant to reflect overcalled and undercalled bases accurately. The % reduction in indels compared to uncorrected sequences is shown for <i>gag</i> (A), <i>env</i> (B) and <i>nef</i> (C), and all three genes combined (D).</p

    Carry forward errors retained after error correction.

    No full text
    <p>Raw uncorrected values and the percentage of carry forward errors retained after error correction is plotted for each of the three gene regions <i>gag</i>, <i>env</i>, <i>nef</i> and all the three genes combined.</p

    454 read coverage across the HIV-1 genome.

    No full text
    <p>Locations of the <i>gag</i>, <i>env</i> and <i>nef</i> genes evaluated in this study are shown. A total of 26,620 reads mapped to <i>gag</i>, 48,927 to <i>env</i> and 21,963 to the <i>nef</i> gene. Reads were aligned to a sample-specific consensus using MOSAIK (<a href="http://bioinformatics.bc.edu/marthlab/Mosaik" target="_blank">http://bioinformatics.bc.edu/marthlab/Mosaik</a>).</p

    Sensitivity and specificity of error correction algorithms in SNP variant calling.

    No full text
    <p>Comparison of CorQ against other pyrosequence error correction and SNP calling algorithms. <i>gag</i>, <i>env</i> and <i>nef</i> gene regions were used to compare the sensitivity and specificity of various algorithms. Sensitivity measures the proportion of true SNPs present in the ten HIV-1 genomes, and correctly identified by the various SNP calling programs. Specificity measures the proportion of true negatives (positions in the gene regions that are invariant) that are correctly identified by the compared programs.</p>*<p>Shown in parenthesis are values from QuRe when the poor coverage regions excluded from sensitivity analysis are included as false negatives.</p

    NSR-seq transcriptional profiling enables identification of a gene signature of Plasmodium falciparum parasites infecting children

    No full text
    Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases
    corecore