Nanopore sequencing is a widely-used high-throughput genome sequencing
technology that can sequence long fragments of a genome into raw electrical
signals at low cost. Nanopore sequencing requires two computationally-costly
processing steps for accurate downstream genome analysis. The first step,
basecalling, translates the raw electrical signals into nucleotide bases (i.e.,
A, C, G, T). The second step, read mapping, finds the correct location of a
read in a reference genome. In existing genome analysis pipelines, basecalling
and read mapping are executed separately. We observe in this work that such
separate execution of the two most time-consuming steps inherently leads to (1)
significant data movement and (2) redundant computations on the data, slowing
down the genome analysis pipeline. This paper proposes GenPIP, an in-memory
genome analysis accelerator that tightly integrates basecalling and read
mapping. GenPIP improves the performance of the genome analysis pipeline with
two key mechanisms: (1) in-memory fine-grained collaborative execution of the
major genome analysis steps in parallel; (2) a new technique for
early-rejection of low-quality and unmapped reads to timely stop the execution
of genome analysis for such reads, reducing inefficient computation. Our
experiments show that, for the execution of the genome analysis pipeline,
GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with
negligible accuracy loss compared to the state-of-the-art software genome
analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design
that combines state-of-the-art in-memory basecalling and read mapping
accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure