6 research outputs found
Data from: De novo sequencing and variant calling with nanopores using PoreSeq
The accuracy of sequencing single DNA molecules with nanopores is continually improving, but de novo genome sequencing and assembly using only nanopore data remain challenging. Here we describe PoreSeq, an algorithm that identifies and corrects errors in nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA transits through the nanopore and finds the sequence that best explains multiple reads of the same region. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100脳 coverage. We also use the algorithm to assemble Escherichia coli with 30脳 coverage and the 位 genome at a range of coverages from 3脳 to 50脳. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods