Motivation: Recent advances in sequencing technologies promise ultra-long
reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads
in high throughput and genomic contigs over 100 mega bases (Mb) in length.
Existing alignment programs are unable or inefficient to process such data at
scale, which presses for the development of new alignment algorithms.
Results: Minimap2 is a general-purpose alignment program to map DNA or long
mRNA sequences against a large reference database. It works with accurate short
reads of ≥100bp in length, ≥1kb genomic reads at error rate ∼15%,
full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely
related full chromosomes of hundreds of megabases in length. Minimap2 does
split-read alignment, employs concave gap cost for long insertions and
deletions (INDELs) and introduces new heuristics to reduce spurious alignments.
It is 3-4 times faster than mainstream short-read mappers at comparable
accuracy and ≥30 times faster at higher accuracy for both genomic and mRNA
reads, surpassing most aligners specialized in one type of alignment.
Availability and implementation: https://github.com/lh3/minimap2
Contact: [email protected]: The final submitted versio