Bacterial reads.
- Publication date
- Publisher
Abstract
<p>For each bacterial genome in a set of 747 genomes, we simulated several read lengths (50 nt, 75 nt, 100 nt, 150 nt, 200 nt, 250 nt) and several substitution error rates (0%, 1%, 5%, 10%). 100 random reads were used in each query and the distribution of the rank of the correct references in the list recorded; a rank of means that the correct reference was at the very top of the list. The list of hits has a maximum length of 25 and we count the reference as ‘not found’ if not in the list at all. The percentage of correct test bacterial genomes present in the list is represented in a bar nested on the right side of each panel. The figure shows that, as expected, the performance degrades as the substitution rate increases, but also that reads of length 50 appear of little practical use for identification purposes. Increasing the read length beyond 100 nt brings only small improvements, and has a limited compensatory effect on the substitution rate. The figure suggests that current leading technology for sequencing possess sufficient length for an accurate identification, and should focus on sequence quality rather than increased read length.</p