12 research outputs found
Benchmark soil metagenome data sets for k-mer counting performance, taken from [11].
<p>Benchmark soil metagenome data sets for k-mer counting performance, taken from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone.0101271-Howe1" target="_blank">[11]</a>.</p
Low-memory digital normalization.
<p><b>The results of digitally normalizing a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) to C = 20 with k = 20 under several memory usage/false positive rates. The false positive rate (column 1) is empirically determined. We measured reads remaining, number of “true” k-mers missing from the data at each step, and the number of total k-mers remaining. Note: at high false positive rates, reads are erroneously removed due to inflation of k-mer counts.</b></p
Iterative low-memory k-mer trimming.
<p><b>The results of trimming reads at unique (erroneous) k-mers from a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) in under 30 MB of RAM. After each iteration, we measured the total number of distinct k-mers in the data set, the total number of unique (and likely erroneous) k-mers remaining, and the number of unique k-mers present at the 3' end of reads.</b></p
Memory usage of k-mer counting tools when calculating k-mer abundance histograms, with maximum resident program size (y axis, in GB) plotted against the total number of distinct k-mers in the data set (x axis, billions of k-mers).
<p>Memory usage of k-mer counting tools when calculating k-mer abundance histograms, with maximum resident program size (y axis, in GB) plotted against the total number of distinct k-mers in the data set (x axis, billions of k-mers).</p
Time for several k-mer counting tools to retrieve the counts of 9.7 m randomly chosen k-mers (y axis), plotted against the number of distinct k-mers in the data set being queried (x axis).
<p>BFCounter, DSK, Turtle, KAnalyze, and KMC do not support this functionality.</p
Disk storage usage of different k-mer counting tools to calculate k-mer abundance histograms in GB (y axis), plotted against the number of distinct k-mers in the data set (x axis).
<p>Note that khmer does not use the disk during counting or retrieval, although its hash tables can be saved for reuse.</p
<i>E. coli</i> genome assembly after low-memory digital normalization.
<p><b>A comparison of assembling reads digitally normalized with low memory/high false positive rates. The reads were digitally normalized to C = 20 (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone.0101271-Brown1" target="_blank">[21]</a> for more information) and were assembled using Velvet. We measured total length of assembly, as well as percent of true MG1655 genome covered by the assembly using QUAST.</b></p
Iterative low-memory k-mer trimming.
<p><b>The results of trimming reads at unique (erroneous) k-mers from a 5 m read </b><b><i>E. coli</i></b><b> data set (1.4 GB) in under 30 MB of RAM. After each iteration, we measured the total number of distinct k-mers in the data set, the total number of unique (and likely erroneous) k-mers remaining, and the number of unique k-mers present at the 3' end of reads.</b></p
Relation between percent miscount — amount by which the count for k-mers is incorrect relative to its true count — on the y axis, plotted against false positive rate (x axis), for five data sets.
<p>The five data sets are the same as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone-0101271-g005" target="_blank">Figure 5</a>.</p