Search CORE

12 research outputs found

Data sets used for analyzing miscounts.

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

Data sets used for analyzing miscounts.</p

FigShare

Benchmark soil metagenome data sets for k-mer counting performance, taken from [11].

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

Benchmark soil metagenome data sets for k-mer counting performance, taken from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone.0101271-Howe1" target="_blank">[11]</a>.</p

FigShare

Low-memory digital normalization.

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

The results of digitally normalizing a 5 m read E. coli data set (1.4 GB) to C = 20 with k = 20 under several memory usage/false positive rates. The false positive rate (column 1) is empirically determined. We measured reads remaining, number of “true” k-mers missing from the data at each step, and the number of total k-mers remaining. Note: at high false positive rates, reads are erroneously removed due to inflation of k-mer counts.</p

FigShare

Iterative low-memory k-mer trimming.

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

The results of trimming reads at unique (erroneous) k-mers from a 5 m read E. coli data set (1.4 GB) in under 30 MB of RAM. After each iteration, we measured the total number of distinct k-mers in the data set, the total number of unique (and likely erroneous) k-mers remaining, and the number of unique k-mers present at the 3' end of reads.</p

FigShare

Memory usage of k-mer counting tools when calculating k-mer abundance histograms, with maximum resident program size (y axis, in GB) plotted against the total number of distinct k-mers in the data set (x axis, billions of k-mers).

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

Memory usage of k-mer counting tools when calculating k-mer abundance histograms, with maximum resident program size (y axis, in GB) plotted against the total number of distinct k-mers in the data set (x axis, billions of k-mers).</p

FigShare

Time for several k-mer counting tools to retrieve the counts of 9.7 m randomly chosen k-mers (y axis), plotted against the number of distinct k-mers in the data set being queried (x axis).

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (638117)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

BFCounter, DSK, Turtle, KAnalyze, and KMC do not support this functionality.</p

FigShare

Disk storage usage of different k-mer counting tools to calculate k-mer abundance histograms in GB (y axis), plotted against the number of distinct k-mers in the data set (x axis).

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

Note that khmer does not use the disk during counting or retrieval, although its hash tables can be saved for reuse.</p

FigShare

E. coli genome assembly after low-memory digital normalization.

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (638117)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

A comparison of assembling reads digitally normalized with low memory/high false positive rates. The reads were digitally normalized to C = 20 (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone.0101271-Brown1" target="_blank">[21]</a> for more information) and were assembled using Velvet. We measured total length of assembly, as well as percent of true MG1655 genome covered by the assembly using QUAST.</p

FigShare

Iterative low-memory k-mer trimming.

Author: Qingpeng Zhang (274356)
Jason Pell (99648)
Rosangela Canino-Koning (604430)
Adina Chuang Howe (604431)
C. Titus Brown (98658)
Publication venue
Publication date: 25/07/2014
Field of study

FigShare

Archivo Digital UPM

Relation between percent miscount — amount by which the count for k-mers is incorrect relative to its true count — on the y axis, plotted against false positive rate (x axis), for five data sets.

Author: Adina Chuang Howe (604431)
C. Titus Brown (98658)
Jason Pell (99648)
Qingpeng Zhang (274356)
Rosangela Canino-Koning (604430)
Publication venue
Publication date
Field of study

The five data sets are the same as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0101271#pone-0101271-g005" target="_blank">Figure 5</a>.</p

FigShare