Search CORE

NORA - Norwegian Open Research Archives

SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

Author: Liu Yongchao
Schmidt Bertil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/04/2014
Field of study

The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014 conferenc

arXiv.org e-Print Archive

University of Toronto Research Repository

160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

Author: Li Isaac TS
Shum Warren
Truong Kevin
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. Results In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. Conclusion This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.</p

Directory of Open Access Journals

Repository for Publications and Research Data

SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2

Author: Dessimoz Christophe
Krähenbühl Philipp
Ledergerber Christian
Szalkowski Adam
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and ×86 architectures. The paper describes swps3 and compares its performances with several other implementations. Findings Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. Conclusion The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences.</p

Directory of Open Access Journals

Serveur académique lausannois

INRIA a CCSD electronic archive server

UCL Discovery

Fine-Grained Parallel Genomic Sequence Comparison

Author: Dominique Lavenier
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

HAL-CentraleSupelec

HAL-Rennes 1

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

Author: A Bairoch
Giorgio Valle
KM Chao
M Farrar
M Gribskov
O Gotoh
S Henikoff
SB Needleman
SF Altschul
Svetlin A Manavski
T Rognes
TF Smith
W Liu
W Pearson
W Pearson
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches

Archivio istituzionale della ricerca - Università di Padova

BLVector: Fast BLAST-Like Algorithm for Manycore CPU With Vectorization

Author: Agostini Federico
Caselli Javier
Dorado Gabriel
Gálvez Rojas Sergio
Hernandez Agustina Pilar
Publication venue: 'Frontiers Media SA'
Publication date: 01/02/2021
Field of study

New High-Performance Computing architectures have been recently developed for commercial central processing unit (CPU). Yet, that has not improved the execution time of widely used bioinformatics applications, like BLAST+. This is due to a lack of optimization between the bases of the existing algorithms and the internals of the hardware that allows taking full advantage of the available CPU cores. To optimize the new architectures, algorithms must be revised and redesigned; usually rewritten from scratch. BLVector adapts the high-level concepts of BLAST+ to the x86 architectures with AVX-512, to harness their capabilities. A deep comprehensive study has been carried out to optimize the approach, with a significant reduction in time execution. BLVector reduces the execution time of BLAST+ when aligning up to mid-size protein sequences (∼750 amino acids). The gain in real scenario cases is 3.2-fold. When applied to longer proteins, BLVector consumes more time than BLAST+, but retrieves a much larger set of results. BLVector and BLAST+ are fine-tuned heuristics. Therefore, the relevant results returned by both are the same, although they behave differently specially when performing alignments with low scores. Hence, they can be considered complementary bioinformatics tools.Fil: Gálvez Rojas, Sergio. Universidad de Malaga. Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Ciencias de la Computacion.; EspañaFil: Agostini, Federico. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Botánica del Nordeste. Universidad Nacional del Nordeste. Facultad de Ciencias Agrarias. Instituto de Botánica del Nordeste; ArgentinaFil: Caselli, Javier. Universidad de Malaga. Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Ciencias de la Computacion.; EspañaFil: Hernandez, Agustina Pilar. Consejo Superior de Investigaciones Científicas; EspañaFil: Dorado, Gabriel. Universidad de Córdoba; Españ

CONICET Digital

PLAST: parallel local alignment search tool for database comparison

Author: A Jacob
D Lavenier
Dominique Lavenier
GM Amdahl
H Zhang
Hoa Van Nguyen
KM Chao
M Farrar
M Gertz
M Pop
M Roytberg
N Firasta
S Karlin
SF Altschul
SF Altschul
SF Altschul
T Rognes
TF Smith
V Sachdeva
W Hu
W Liu
WR Pearson
X Fei
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results: A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusions: A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

HAL-CentraleSupelec

CiteSeerX

Directory of Open Access Journals

INRIA a CCSD electronic archive server