12 research outputs found
Protein alignment HW/SW optimizations
Biosequence alignment recently received an amazing support from both commodity and dedicated hardware platforms. The limitless requirements of this application motivate the search for improved implementations to boost processing time and capabilities. We propose an unprecedented hardware improvement to the classic Smith-Waterman (S-W) algorithm based on a twofold approach: i) an on-the-fly gap-open/gap-extension selection that reduces the hardware implementation complexity; ii) a pre-selection filter that uses reduced amino-acid alphabets to screen out not-significant sequences and to shorten the S-Witerations on huge reference databases.We demonstrated the improvements w.r.t. a classic approach both from the point of view of algorithm efficiency and of HW performance (FPGA and ASIC post-synthesis analysis)
Dataflow acceleration of Smith-Waterman with Traceback for high throughput Next Generation Sequencing
Smith-Waterman algorithm is widely adopted bymost popular DNA sequence aligners. The inherent algorithmcomputational intensity and the vast amount of NGS input datait operates on, create a bottleneck in genomic analysis flows forshort-read alignment. FPGA architectures have been extensivelyleveraged to alleviate the problem, each one adopting a differentapproach. In existing solutions, effective co-design of the NGSshort-read alignment still remains an open issue, mainly due tonarrow view on real integration aspects, such as system widecommunication and accelerator call overheads. In this paper, wepropose a dataflow architecture for Smith-Waterman Matrix-filland Traceback alignment stages, to perform short-read alignmenton NGS data. The architectural decision of moving both stages onchip extinguishes the communication overhead, and coupled withradical software restructuring, allows for efficient integration intowidely-used Bowtie2 aligner. This approach delivers×18 speedupover the respective Bowtie2 standalone components, while our co-designed Bowtie2 demonstrates a 35% boost in performance
An FPGA accelerator of the wavefront algorithm for genomics pairwise alignment
In the last years, advances in next-generation sequencing technologies have enabled the proliferation of genomic applications that guide personalized medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only implementation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.ed medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only imple- mentation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C21/AEI/10.13039/501100011033), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), by the IBM/BSC Deep Learning Center initiative, and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. Ll. Alvarez has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Juan de la Cierva Formacion fellowship No. FJCI-2016-30984. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft
SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs
Sequence alignment forms an important backbone in many sequencing
applications. A commonly used strategy for sequence alignment is an approximate
string matching with a two-dimensional dynamic programming approach. Although
some prior work has been conducted on GPU acceleration of a sequence alignment,
we identify several shortcomings that limit exploiting the full computational
capability of modern GPUs. This paper presents SaLoBa, a GPU-accelerated
sequence alignment library focused on seed extension. Based on the analysis of
previous work with real-world sequencing data, we propose techniques to exploit
the data locality and improve workload balancing. The experimental results
reveal that SaLoBa significantly improves the seed extension kernel compared to
state-of-the-art GPU-based methods.Comment: Published at IPDPS'2
Decomposing Genomics Algorithms: Core Computations for Accelerating Genomics
Technological advances in genomic analyses and computing sciences has led to a burst in genomics data. With those advances, there has also been parallel growth in dedicated accelerators for specific genomic analyses. However, biologists are in need of a reconfigurable machine that can allow them to perform multiple analyses without needing to go for dedicated compute platforms for each analysis. This work addresses the first steps in the design of such a reconfigurable machine. We hypothesize that this machine design can consist of some accelerators of computations common across various genomic analyses. This work studies a subset of genomic analyses and identifies such core computations. We further investigate the possibility of further accelerating through a deeper analysis of the computation primitives.National Science Foundation (NSF CNS 13-37732); Infosys; IBM Faculty Award; Office of the Vice Chancellor for Research, University of Illinois at Urbana-ChampaignOpe
State-of-the-art in Smith-Waterman Protein Database Search on HPC Platforms
Searching biological sequence database is a common and repeated task in bioinformatics and molecular biology. The Smith–Waterman algorithm is the most accurate method for this kind of search. Unfortunately, this algorithm is computationally demanding and the situation gets worse due to the exponential growth of biological data in the last years. For that reason, the scientific community has made great efforts to accelerate Smith–Waterman biological database searches in a wide variety of hardware platforms. We give a survey of the state-of-the-art in Smith–Waterman protein database search, focusing on four hardware architectures: central processing units, graphics processing units, field programmable gate arrays and Xeon Phi coprocessors. After briefly describing each hardware platform, we analyse temporal evolution, contributions, limitations and experimental work and the results of each implementation. Additionally, as energy efficiency is becoming more important every day, we also survey performance/power consumption works. Finally, we give our view on the future of Smith–Waterman protein searches considering next generations of hardware architectures and its upcoming technologies.Instituto de Investigación en InformáticaUniversidad Complutense de Madri
High performance reconfigurable architectures for biological sequence alignment
Bioinformatics and computational biology (BCB) is a rapidly developing
multidisciplinary field which encompasses a wide range of domains, including genomic
sequence alignments. It is a fundamental tool in molecular biology in searching for
homology between sequences. Sequence alignments are currently gaining close attention due
to their great impact on the quality aspects of life such as facilitating early disease diagnosis,
identifying the characteristics of a newly discovered sequence, and drug engineering. With
the vast growth of genomic data, searching for a sequence homology over huge databases
(often measured in gigabytes) is unable to produce results within a realistic time, hence the
need for acceleration. Since the exponential increase of biological databases as a result of the
human genome project (HGP), supercomputers and other parallel architectures such as the
special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs)
and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms.
Nevertheless, there are always trade-off between area, speed, power, cost, development time
and reusability when selecting an acceleration platform. FPGAs generally offer more
flexibility, higher performance and lower overheads. However, they suffer from a relatively
low level programming model as compared with off-the-shelf microprocessors such as
standard microprocessors and GPUs. Due to the aforementioned limitations, the need has
arisen for optimized FPGA core implementations which are crucial for this technology to
become viable in high performance computing (HPC).
This research proposes the use of state-of-the-art reprogrammable system-on-chip
technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the
Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model
(HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The
three novel aspects of this research are firstly that the algorithms are designed and
implemented in hardware, with each core achieving the highest performance compared to the
state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering
technique is adopted into the hardware architectures. Here, when the alignment matrix
computation task is overlapped with the PE configuration in a folded systolic array, the
overall throughput of the core is significantly increased. This is due to the bound PE
configuration time and the parallel PE configuration approach irrespective of the number of
PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without
relying on restricted onboard memory resources. Finally, a new performance metric is
devised, which facilitates the effective comparison of design performance between different
FPGA devices and families. The normalized performance indicator (speed-up per area per
process technology) takes out advantages of the area and lithography technology of any
FPGA resulting in fairer comparisons.
The cores have been designed using Verilog HDL and prototyped on the Alpha Data
ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation
results show that the proposed architectures achieved giga cell updates per second (GCUPS)
performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman
with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In
terms of speed-up improvements, comparisons were made on performance of the designed
cores against their corresponding software and the reported FPGA implementations. In the
case of comparison with equivalent software execution, acceleration of the optimal
alignment algorithm in hardware yielded an average speed-up of 269x as compared to the
SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core
achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of
HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped
BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up
compared to the latest NCBI BLAST software. In terms of comparison against other reported
FPGA implementations, the proposed normalized performance indicator was used to
evaluate the designed architectures fairly. The results showed that the first architecture
achieved more than 50 percent improvement, while acceleration of the profile HMM
sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the
gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after
taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in
terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per
dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive
platform for high performance computation with advantages of smaller area footprint as well
as represent economic ‘green’ solution compared to the other acceleration platforms. Higher
throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs
with minimal design effort
FPGA acceleration of sequence analysis tools in bioinformatics
Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility.
BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code.
Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics.
The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools