Bacterial reads (number of reads).

Abstract

<p>For each bacterial genome in a set of 747 genomes, we simulated several read lengths (50 nt, 75 nt, 100 nt, 150 nt, 200 nt, 250 nt) and several substitution error rates (0%, 1%, 5%, 10%). Independent samples of 5, 10, 25, 50, 100, 200, or 300 random reads were used in each query and the distribution of the rank of the correct references in the list recorded; a rank of means that the correct reference was at the very top of the list. The list of hits has a maximum length of 25 and we count the reference as ‘not found’ if it not present in the list. The percentages of correct test bacterial genomes found in that list are represented in a bar plot nested on the right side of each panel. Increasing the number of reads in the random sample beyond 100 reads only improves very slightly the performance observed, mostly for shorter read lengths and higher substitution rates. The substitution rate or the read length has much stronger effects on the performance.</p

    Similar works

    Full text

    thumbnail-image

    Available Versions