255 research outputs found
FPGA acceleration of sequence analysis tools in bioinformatics
Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility.
BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code.
Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics.
The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools
Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts
Genetic sequence alignment has always been a computational challenge in bioinformatics. Depending on the problem size, software-based aligners can take multiple CPU-days to process the sequence data, creating a bottleneck point in bioinformatic analysis flow. Reconfigurable accelerator can achieve high performance for such computation by providing massive parallelism, but at the expense of programming flexibility and thus has not been commensurately used by practitioners. Therefore, this paper aims to provide a thorough survey of the proposed accelerators by giving a qualitative categorization based on their algorithms and speedup. A comprehensive comparison between work is also presented so as to guide selection for biologist, and to provide insight on future research direction for FPGA scientists
Design and Evaluation of a BLAST Ungapped Extension Accelerator, Master\u27s Thesis
The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data is becoming an increasingly difficult task. This thesis focuses on accelerating the most widely-used software tool for analyzing genomic data, BLAST. This thesis presents Mercury BLAST, a novel method for accelerating searches through massive DNA databases. Mercury BLAST takes a streaming approach to the BLAST computation by offloading the performance-critical sections onto reconfigurable hardware. This hardware is then used in combination with the processor of the host system to deliver BLAST results in a fraction of the time of the general-purpose processor alone. Mercury BLAST makes use of new algorithms combined with reconfigurable hardware to accelerate BLAST-like similarity search. An evaluation of this method for use in real BLAST-like searches is presented along with a characterization of the quality of results associated with using these new algorithms in specialized hardware. The primary focus of this thesis is the design of the ungapped extension stage of Mercury BLAST. The architecture of the ungapped extension stage is described along with the context of this stage within the Mercury BLAST system. The design is compact and performs over 20× faster than that of the standard software ungapped extension, yielding close to 50× speedup over the complete software BLAST application. The quality of Mercury BLAST results is essentially equivalent to the standard BLAST results
Parallelization of dynamic programming recurrences in computational biology
The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs
Motivation: We introduce SneakySnake, a highly parallel and highly accurate
pre-alignment filter that remarkably reduces the need for computationally
costly sequence alignment. The key idea of SneakySnake is to reduce the
approximate string matching (ASM) problem to the single net routing (SNR)
problem in VLSI chip layout. In the SNR problem, we are interested in finding
the optimal path that connects two terminals with the least routing cost on a
special grid layout that contains obstacles. The SneakySnake algorithm quickly
solves the SNR problem and uses the found optimal path to decide whether or not
performing sequence alignment is necessary. Reducing the ASM problem into SNR
also makes SneakySnake efficient to implement on CPUs, GPUs, and FPGAs.
Results: SneakySnake significantly improves the accuracy of pre-alignment
filtering by up to four orders of magnitude compared to the state-of-the-art
pre-alignment filters, Shouji, GateKeeper, and SHD. For short sequences,
SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's
bit-vector algorithm) and Parasail (state-of-the-art sequence aligner with a
configurable scoring function), by up to 37.7x and 43.9x (>12x on average),
respectively, with its CPU implementation, and by up to 413x and 689x (>400x on
average), respectively, with FPGA and GPU acceleration. For long sequences, the
CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence
aligner of minimap2) by up to 979x (276.9x on average) and 91.7x (31.7x on
average), respectively. As SneakySnake does not replace sequence alignment,
users can still obtain all capabilities (e.g., configurable scoring functions)
of the aligner of their choice, unlike existing acceleration efforts that
sacrifice some aligner capabilities. Availability:
https://github.com/CMU-SAFARI/SneakySnakeComment: To appear in Bioinformatic
FPGA acceleration of DNA sequence alignment: design analysis and optimization
Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows.
1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy.
2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software.
3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces
High performance reconfigurable architectures for biological sequence alignment
Bioinformatics and computational biology (BCB) is a rapidly developing
multidisciplinary field which encompasses a wide range of domains, including genomic
sequence alignments. It is a fundamental tool in molecular biology in searching for
homology between sequences. Sequence alignments are currently gaining close attention due
to their great impact on the quality aspects of life such as facilitating early disease diagnosis,
identifying the characteristics of a newly discovered sequence, and drug engineering. With
the vast growth of genomic data, searching for a sequence homology over huge databases
(often measured in gigabytes) is unable to produce results within a realistic time, hence the
need for acceleration. Since the exponential increase of biological databases as a result of the
human genome project (HGP), supercomputers and other parallel architectures such as the
special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs)
and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms.
Nevertheless, there are always trade-off between area, speed, power, cost, development time
and reusability when selecting an acceleration platform. FPGAs generally offer more
flexibility, higher performance and lower overheads. However, they suffer from a relatively
low level programming model as compared with off-the-shelf microprocessors such as
standard microprocessors and GPUs. Due to the aforementioned limitations, the need has
arisen for optimized FPGA core implementations which are crucial for this technology to
become viable in high performance computing (HPC).
This research proposes the use of state-of-the-art reprogrammable system-on-chip
technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the
Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model
(HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The
three novel aspects of this research are firstly that the algorithms are designed and
implemented in hardware, with each core achieving the highest performance compared to the
state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering
technique is adopted into the hardware architectures. Here, when the alignment matrix
computation task is overlapped with the PE configuration in a folded systolic array, the
overall throughput of the core is significantly increased. This is due to the bound PE
configuration time and the parallel PE configuration approach irrespective of the number of
PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without
relying on restricted onboard memory resources. Finally, a new performance metric is
devised, which facilitates the effective comparison of design performance between different
FPGA devices and families. The normalized performance indicator (speed-up per area per
process technology) takes out advantages of the area and lithography technology of any
FPGA resulting in fairer comparisons.
The cores have been designed using Verilog HDL and prototyped on the Alpha Data
ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation
results show that the proposed architectures achieved giga cell updates per second (GCUPS)
performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman
with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In
terms of speed-up improvements, comparisons were made on performance of the designed
cores against their corresponding software and the reported FPGA implementations. In the
case of comparison with equivalent software execution, acceleration of the optimal
alignment algorithm in hardware yielded an average speed-up of 269x as compared to the
SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core
achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of
HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped
BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up
compared to the latest NCBI BLAST software. In terms of comparison against other reported
FPGA implementations, the proposed normalized performance indicator was used to
evaluate the designed architectures fairly. The results showed that the first architecture
achieved more than 50 percent improvement, while acceleration of the profile HMM
sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the
gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after
taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in
terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per
dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive
platform for high performance computation with advantages of smaller area footprint as well
as represent economic ‘green’ solution compared to the other acceleration platforms. Higher
throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs
with minimal design effort
- …