A FAST IMPLEMENTATION FOR CORRECTING ERRORS IN HIGH THROUGHPUT SEQUENCING DATA

Abstract

ABSTRACT The impact of the next generation DNA sequencing technologies (NGS) produced a revolu­tion in biological research. New computational tools are needed to deal with the huge amounts of data they output. Significantly shorter length of the reads and higher per-base error rate compared with Sanger technology make things more difficult and still critical problems, such as genome assembly, are not satisfactorily solved. Significant efforts have been spent recently on software programs aimed at increasing the quality of the NGS data by correcting errors. The most accurate program to date is HiTEC and our contribution is providing a completely new implementation, HiTEC2. The new program is many times faster and uses much less space, while correcting more errors in the same number of iterations. We have eliminated the need of the suffix array data structure and the need of installing complicating statistical libraries as well, thus making HiTEC2 not only more efficient but also friendlier

    Similar works