21 research outputs found

    Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

    Get PDF
    <div><p>Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.</p> </div

    Plots depicting the effect of DSN treatment on RNA-Seq libraries of lettuce cultivar Valmaine.

    No full text
    <p>Abundance of each of 25,857 transcripts from a lettuce transcriptome assembly is given in reads per kilobase of exon per million mapped sequence reads (RPKM; <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055913#pone.0055913-Mortazavi1" target="_blank">[30]</a>). The most highly abundant transcripts in the control library were the most depleted and <i>vice versa</i> for the less abundant transcripts. A) Boxplots of 24 bins based on transcript abundance of the RPKM ratio of the DSN normalized library over the control library. The whiskers encompass 1.5 of the interquartile range (IQR). The confidence diamonds indicate the average fold change when Student t-test p-value is less than 0.01 (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055913#pone.0055913.s004" target="_blank">Table S1</a> for the actual p-values that range from 4×e<sup>−195</sup> to 1×e<sup>−4</sup> for the significant bins). A sequence with a ratio outside of the range covered by the whiskers is considered as an outlier and shown as a black dot. B) Scatter plot depicting the effect of DSN treatment on the abundance of reads representing each of 25,857 transcripts. Sequences that are neither reduced nor enhanced as a result of DSN treatment would align along the dotted blue lines. Color-coding indicates % GC content (see below). Sequences with high GC content (red dots) were more depleted than sequences with low GC content (blue and purple dots). C) Close-up of the section of the scatter plot in B (outlined in the blue rectangle), which highlights the effect of DSN treatment on transcripts represented in the libraries at lower abundance. A substantial number of the reads representing less abundant transcripts were enriched in the normalized library, particularly those with GC content of 45% or less.</p

    The consequences of DSN treatment on rare and abundant transcripts of lettuce.

    No full text
    <p>Reads in the control <i>L. sativa</i> cv. Valmaine RNA-Seq library were separated into bins based on their abundance expressed as RPKM (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055913#pone.0055913.s004" target="_blank">Table S1</a>). The average RPKM values for 100 randomly selected genes in the control and the DSN-normalized libraries are shown as bars for each bin; the green line indicates the total number of genes in each bin. A) Less abundant transcripts (reads), representing the majority of genes expressed, are enriched in the DSN-treated library. B) The relatively few genes with highly abundant transcripts in the control library have substantially reduced abundance in the DSN-treated library.</p

    The effect of renaturation in 3 M TMAC versus 0.5 M NaCl on the GC composition of normalized transcriptomic and genomic libraries of lettuce.

    No full text
    <p>A) Twenty million reads in each RNA-Seq library (green: renatured in 0.5 M NaCl; blue: renatured in 3 M TMAC) and 10 million reads in each genomic library (red: renatured in 0.5 M NaCl; orange: renatured in 3 M TMAC) were categorized by % GC content and then the percentage of the total number of reads in each GC category was plotted against the % GC content of the category. The average GC content of reads in both types of library renatured in 3 M TMAC was approximately 1% greater than in the libraries renatured in 0.5 M NaCl (+1.1% for mRNA and +1.4% for gDNA). That shift is statistically significant based on a matched pairs difference analysis where the probability of being inferior to the Wilcoxon Signed Rank Test Statistic S is 0.9830. B) RNA-Seq reads from two libraries, one normalized using 3 M TMAC and the other 0.5 M NaCl, were separately mapped to a QC set of 25,857 uninterrupted ORFs identified in the lettuce transcriptome assembly. The differential abundance of reads representing each ORF in the libraries was then calculated by subtracting the RPKM for each gene in the TMAC-renatured library from the RPKM for that sequence in the NaCl-renatured library and the sum divided by the total RPKM for the gene in both libraries and these values plotted against the average GC content of that gene (Low coverage, <20 RPKM: gray dots. Medium coverage, 20 to 40 RPKM: blue dots. High coverage, 40 to 300 RPKM: red dots). Statistical significance was assessed by two tail Student t-tests for each RPKM bin. Differences between the hybridizations in NaCl and TMAC were small for transcripts present at moderate levels (Medium RPKM bin). However, NaCl was significantly more effective than TMAC both in reducing abundantly expressed transcripts (RPKM >40; average RPKM −16% with NaCl treatment; P-value = 2.6 e<sup>−41</sup>) and increasing the number of relatively rare transcripts (RPKM <20; average RPKM +5% with NaCl treatment; P-value = 9.8 e<sup>−33</sup>). The negative slopes of the regression lines (gray for low RPKM, green for medium RPKM, and red for high RPKM genes) indicate that genes with higher GC content tended to be represented at higher levels in the library renatured using 3 M TMAC as compared to 0.5 M NaCl and that, conversely, genes with lower GC content tended to be represented at higher levels in the library renatured using 0.5 M NaCl. This trend was more pronounced for genes with higher RPKM. C) Sixty million genomic reads from two libraries, one renatured using 3 M TMAC and the other 0.5 M NaCl, were separately mapped to a QC set of 25,857 uninterrupted ORFs identifed in the lettuce transcriptome assembly. The differential abundance of reads representing each gene in the two libraries was then calculated (as described in the text) and plotted against the average GC content of that gene (Low Coverage, <30 RPKM: gray circles. Medium coverage, 30 to 80 RPKM: blue circles. High coverage, >80 RPKM: red circles). The negative slopes of the regression lines (gray for low RPKM, green for medium RPKM, and red for high RPKM) indicate that genes with higher GC content tended to be represented at higher levels in the library normalized using 3 M TMAC as compared with 0.5 M NaCl and that genes with lower GC content tended to be represented at higher levels in the library renatured using 0.5 M NaCl. The low and medium RPKM bins were highly significantly different for both NaCl and TMAC treatments. The treatments were not significantly different for the high RPKM bin.</p

    Reduction of repeated sequences and increases of single-copy gene sequences encoded in a BAC clone in normalized genomic libraries.

    No full text
    <p>A) The abundance of reads from four lettuce libraries: RNA-Seq (RNA-Seq), non-normalized genomic (untreated gDNA control), genomic normalized after hybridization in 0.5 M NaCl (gDNA DSN (NaCl)) and genomic normalized after hybridization in 3 M TMAC (gDNA DSN (TMAC)). The track above the RNA-Seq data shows the GC content. B) The region of A) enlarged to reveal the numbers of sequencing reads that mapped to two single-copy genes in the BAC. The homolog of At4g28200 is contained between the two vertical blue lines on the left side of the figure and the <i>FT</i> homolog is contained between the two vertical blue lines on the right side of the figure.</p

    The majority of coding regions in Arabidopsis are not reduced as a result of DSN treatment.

    No full text
    <p>Abundance of more than 26,000 coding sequences compared between the control genomic library (x axis) and the library normalized with DSN after 22 hrs of renaturation (y axis) is expressed in RPKM. Sequences neither reduced nor enhanced as a result of DSN treatment align along the dotted gray line. The vast majority of coding sequences were at least as abundant in the normalized library as in the control library (for distribution and statistical significance, see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0055913#pone.0055913.s001" target="_blank">Figure S1</a>). Color-coding indicates % GC content. The few coding sequences that were reduced tended to have relatively high GC content (Student t-test P-value = 0.0452).</p

    Depletion or enrichment of gene sequences in genomic libraries is dependent on the number of copies of the gene in the genome.

    No full text
    <p>Sixty million reads from each library, normalized using 0.5 M NaCl (blue bars) or 3 M TMAC (red bars), were mapped to a set of 25,857 lettuce transcriptome contigs with uninterrupted ORFs; the contig/gene sequences were then placed in bins based on estimates (see text) of the number of copies of each gene in the lettuce genome. A transition from enrichment of gene sequences present in the genome in fewer copies to depletion of sequences present in more than 100 copies was observed.</p

    Mapped reads from control and normalized genomic libraries.

    No full text
    <p>Number of mapped reads from control and normalized genomic libraries to simple repeats (1–3), centromeric (4), ribosomal RNA (5–6), retrotransposon <i>pol</i> consensus (7), and other uncharacterized genomic repeats (8–25). Twenty eight million reads were analyzed from each library.</p
    corecore