6 research outputs found
RawHash2: Accurate and Fast Mapping of Raw Nanopore Signals using a Hash-based Seeding Mechanism
Summary: Raw nanopore signals can be analyzed while they are being generated,
a process known as real-time analysis. Real-time analysis of raw signals is
essential to utilize the unique features that nanopore sequencing provides,
enabling the early stopping of the sequencing of a read or the entire
sequencing run based on the analysis. The state-of-the-art mechanism, RawHash,
offers the first hash-based efficient and accurate similarity identification
between raw signals and a reference genome by quickly matching their hash
values. In this work, we introduce RawHash2, which provides major improvements
over RawHash, including a more sensitive chaining implementation, weighted
mapping decisions, frequency filters to reduce ambiguous seed hits, minimizers
for hash-based sketching, and support for the R10.4 flow cell version and
various data formats such as POD5. Compared to RawHash, RawHash2 provides
better F1 accuracy (on average by 3.44% and up to 10.32%) and better throughput
(on average by 2.3x and up to 5.4x) than RawHash.
Availability and Implementation: RawHash2 is available at
https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully
reproduce our results on our GitHub page
TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering
Basecalling is an essential step in nanopore sequencing analysis where the
raw signals of nanopore sequencers are converted into nucleotide sequences,
i.e., reads. State-of-the-art basecallers employ complex deep learning models
to achieve high basecalling accuracy. This makes basecalling
computationally-inefficient and memory-hungry; bottlenecking the entire genome
analysis pipeline. However, for many applications, the majority of reads do no
match the reference genome of interest (i.e., target reference) and thus are
discarded in later steps in the genomics pipeline, wasting the basecalling
computation. To overcome this issue, we propose TargetCall, the first fast and
widely-applicable pre-basecalling filter to eliminate the wasted computation in
basecalling. TargetCall's key idea is to discard reads that will not match the
target reference (i.e., off-target reads) prior to basecalling. TargetCall
consists of two main components: (1) LightCall, a lightweight neural network
basecaller that produces noisy reads; and (2) Similarity Check, which labels
each of these noisy reads as on-target or off-target by matching them to the
target reference. TargetCall filters out all off-target reads before
basecalling; and the highly-accurate but slow basecalling is performed only on
the raw signals whose noisy reads are labeled as on-target. Our thorough
experimental evaluations using both real and simulated data show that
TargetCall 1) improves the end-to-end basecalling performance of the
state-of-the-art basecaller by 3.31x while maintaining high (98.88%)
sensitivity in keeping on-target reads, 2) maintains high accuracy in
downstream analysis, 3) precisely filters out up to 94.71% of off-target reads,
and 4) achieves better performance, sensitivity, and generality compared to
prior works. We freely open-source TargetCall at
https://github.com/CMU-SAFARI/TargetCall
Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs
Motivation: Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. Results: We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1x, 1.7x, and 2.1x speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0x, 80.4x, 6.8x, 12.6x, and 5.9x speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6x less chip area and 2.1x less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge.ISSN:1367-4803ISSN:1460-205
From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures
We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologies are still not able to read a genome in its entirety. We describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures. We explain state-of-the-art algorithmic methods and hardware-based acceleration approaches for each step of the genome analysis pipeline and provide experimental evaluations. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or various execution paradigms (e.g., processing inside or near memory) along with algorithmic changes, leading to new hardware/software co-designed systems. We conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics. We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent.ISSN:2001-037
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis
Profile hidden Markov models (pHMMs) are widely used in many bioinformatics applications to accurately identify similarities between biological sequences (e.g., DNA or protein sequences). PHMMs use a commonly-adopted and highly-accurate method, called the Baum-Welch algorithm, to calculate these similarities. However, the Baum-Welch algorithm is computationally expensive, and existing works provide either software- or hardware-only solutions for a fixed pHMM design. When we analyze the state-of-the-art works, we find that there is a pressing need for a flexible, high-performant, and energy-efficient hardware-software co-design to efficiently and effectively solve all the major inefficiencies in the Baum-Welch algorithm for pHMMs. We propose ApHMM, the first flexible acceleration framework that can significantly reduce computational and energy overheads of the Baum-Welch algorithm for pHMMs. ApHMM leverages hardware-software co-design to solve the major inefficiencies in the Baum-Welch algorithm by 1) designing a flexible hardware to support different pHMMs designs, 2) exploiting the predictable data dependency pattern in an on-chip memory with memoization techniques, 3) quickly eliminating negligible computations with a hardware-based filter, and 4) minimizing the redundant computations. We implement our 1) hardware-software optimizations on a specialized hardware and 2) software optimizations for GPUs to provide the first flexible Baum-Welch accelerator for pHMMs. ApHMM provides significant speedups of 15.55x-260.03x, 1.83x-5.34x, and 27.97x compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms the state-of-the-art CPU implementations of three important bioinformatics applications, 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x-59.94x, 1.03x-1.75x, and 1.03x-1.95x, respectively