177 research outputs found
Shouji: A Fast and Efficient Pre-Alignment Filter for Sequence Alignment
Motivation: The ability to generate massive amounts of sequencing data
continues to overwhelm the processing capability of existing algorithms and
compute infrastructures. In this work, we explore the use of hardware/software
co-design and hardware acceleration to significantly reduce the execution time
of short sequence alignment, a crucial step in analyzing sequenced genomes. We
introduce Shouji, a highly-parallel and accurate pre-alignment filter that
remarkably reduces the need for computationally-costly dynamic programming
algorithms. The first key idea of our proposed pre-alignment filter is to
provide high filtering accuracy by correctly detecting all common subsequences
shared between two given sequences. The second key idea is to design a hardware
accelerator that adopts modern FPGA (Field-Programmable Gate Array)
architectures to further boost the performance of our algorithm.
Results: Shouji significantly improves the accuracy of pre-alignment
filtering by up to two orders of magnitude compared to the state-of-the-art
pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to
three orders of magnitude faster than the equivalent CPU implementation of
Shouji. Using a single FPGA chip, we benchmark the benefits of integrating
Shouji with five state-of-the-art sequence aligners, designed for different
computing platforms. The addition of Shouji as a pre-alignment step reduces the
execution time of the five state-of-the-art sequence aligners by up to 18.8x.
Shouji can be adapted for any bioinformatics pipeline that performs sequence
alignment for verification. Unlike most existing methods that aim to accelerate
sequence alignment, Shouji does not sacrifice any of the aligner capabilities,
as it does not modify or replace the alignment step.
Availability: https://github.com/CMU-SAFARI/ShoujiComment: https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz234/5421509,
Bioinformatics Journal 201
MAGNET: Understanding and Improving the Accuracy of Genome Pre-Alignment Filtering
In the era of high throughput DNA sequencing (HTS) technologies, calculating
the edit distance (i.e., the minimum number of substitutions, insertions, and
deletions between a pair of sequences) for billions of genomic sequences is the
computational bottleneck in todays read mappers. The shifted Hamming distance
(SHD) algorithm proposes a fast filtering strategy that can rapidly filter out
invalid mappings that have more edits than allowed. However, SHD shows high
inaccuracy in its filtering by admitting invalid mappings to be marked as
correct ones. This wastes the execution time and imposes a large computational
burden. In this work, we comprehensively investigate four sources that lead to
the filtering inaccuracy. We propose MAGNET, a new filtering strategy that
maintains high accuracy across different edit distance thresholds and data
sets. It significantly improves the accuracy of pre-alignment filtering by one
to two orders of magnitude. The MATLAB implementations of MAGNET and SHD are
open source and available at: https://github.com/BilkentCompGen/MAGNET.Comment: 10 Pages, 13 Figure
DESIGN AND CHARACTERIZATION OF LOW-POWER LOW-NOISE ALLDIGITAL SERIAL LINK FOR POINT-TO-POINT COMMUNICATION IN SOC
The fully-digital implementation of serial links has recently emerged as a viable
alternative to their classical analogue counterpart. Indeed, reducing the analogue
content in favour of expanding the digital content becomes more attractive due to the
ability to achieve less power consumption, less sensitivity to the noise and better
scalability across multiple technologies and platforms with inconsiderable
modifications. In addition, describing the circuit in hardware description languages
gives it a high flexibility to program all design parameters in a very short time
compared with the analogue designs which need to be re-designed at transistor level
for any parameter change. This can radically reduce cost and time-to-market by
saving a significant amount of development time. However, beside these considerable
advantages, the fully-digital architecture poses several design challenges
SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs
Motivation: We introduce SneakySnake, a highly parallel and highly accurate
pre-alignment filter that remarkably reduces the need for computationally
costly sequence alignment. The key idea of SneakySnake is to reduce the
approximate string matching (ASM) problem to the single net routing (SNR)
problem in VLSI chip layout. In the SNR problem, we are interested in finding
the optimal path that connects two terminals with the least routing cost on a
special grid layout that contains obstacles. The SneakySnake algorithm quickly
solves the SNR problem and uses the found optimal path to decide whether or not
performing sequence alignment is necessary. Reducing the ASM problem into SNR
also makes SneakySnake efficient to implement on CPUs, GPUs, and FPGAs.
Results: SneakySnake significantly improves the accuracy of pre-alignment
filtering by up to four orders of magnitude compared to the state-of-the-art
pre-alignment filters, Shouji, GateKeeper, and SHD. For short sequences,
SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's
bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a
configurable scoring function), by up to 37.7x and 43.9x (>12x on average),
respectively, with its CPU implementation, and by up to 413x and 689x (>400x on
average), respectively, with FPGA and GPU acceleration. For long sequences, the
CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence
aligner of minimap2) by up to 979x (276.9x on average) and 91.7x (31.7x on
average), respectively. As SneakySnake does not replace sequence alignment,
users can still obtain all capabilities (e.g., configurable scoring functions)
of the aligner of their choice, unlike existing acceleration efforts that
sacrifice some aligner capabilities. Availability:
https://github.com/CMU-SAFARI/SneakySnakeComment: To appear in Bioinformatic
Accelerating Genome Analysis: A Primer on an Ongoing Journey
Genome analysis fundamentally starts with a process known as read mapping,
where sequenced fragments of an organism's genome are compared against a
reference genome. Read mapping is currently a major bottleneck in the entire
genome analysis pipeline, because state-of-the-art genome sequencing
technologies are able to sequence a genome much faster than the computational
techniques employed to analyze the genome. We describe the ongoing journey in
significantly improving the performance of read mapping. We explain
state-of-the-art algorithmic methods and hardware-based acceleration
approaches. Algorithmic approaches exploit the structure of the genome as well
as the structure of the underlying hardware. Hardware-based acceleration
approaches exploit specialized microarchitectures or various execution
paradigms (e.g., processing inside or near memory). We conclude with the
challenges of adopting these hardware-accelerated read mappers.Comment: This is an extended and updated version of a paper published in IEEE
Micro, vol. 40, no. 5, pp. 65-75, 1 Sept.-Oct. 2020,
https://doi.org/10.1109/MM.2020.301372
- …