1,574 research outputs found
High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP
This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs
Dynamic Multigrain Parallelization on the Cell Broadband Engine
This paper addresses the problem of orchestrating and scheduling
parallelism at multiple levels of granularity on heterogeneous
multicore processors. We present policies and mechanisms for adaptive
exploitation and scheduling of multiple layers of parallelism on the
Cell Broadband Engine. Our policies combine event-driven task
scheduling with malleable loop-level parallelism, which is exposed
from the runtime system whenever task-level parallelism leaves cores
idle. We present a runtime system for scheduling applications with
layered parallelism on Cell and investigate its potential with RAxML,
a computational biology application which infers large phylogenetic
trees, using the Maximum Likelihood (ML) method. Our experiments show
that the Cell benefits significantly from dynamic parallelization
methods, that selectively exploit the layers of parallelism in the
system, in response to workload characteristics. Our runtime
environment outperforms naive parallelization and scheduling based on
MPI and Linux by up to a factor of 2.6. We are able to execute RAxML
on one Cell four times faster than on a dual-processor system with
Hyperthreaded Xeon processors, and 5--10\% faster than on a
single-processor system with a dual-core, quad-thread IBM Power5
processor
RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine
Phylogenetic tree reconstruction is one of the grand challenge
problems in Bioinformatics. The search for a best-scoring tree with 50
organisms, under a reasonable optimality criterion, creates a
topological search space which is as large as the number of atoms in
the universe. Computational phylogeny is challenging even for the most
powerful supercomputers. It is also an ideal candidate for
benchmarking emerging multiprocessor architectures, because it
exhibits various levels of fine and coarse-grain parallelism. In this
paper, we present the porting, optimization, and evaluation of RAxML
on the Cell Broadband Engine. RAxML is a provably efficient, hill
climbing algorithm for computing phylogenetic trees based on the
Maximum Likelihood (ML) method. The algorithm uses an embarrassingly
parallel search method, which also exhibits data-level parallelism and
control parallelism in the computation of the likelihood functions.
We present the optimization of one of the currently fastest tree
search algorithms, on a real Cell blade prototype. We also
investigate problems and present solutions pertaining to the
optimization of floating point code, control flow, communication,
scheduling, and multi-level parallelization on the Cell
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Phylogenetic inference is considered to be one of the grand challenges in Bioinformatics due to the immense computational requirements. RAxML is currently among the fastest and most accurate programs for phylogenetic tree inference under the Maximum Likelihood (ML) criterion. First, we introduce new tree search heuristics that accelerate RAxML by a factor of 2.43 while returning equally good trees. The performance of the new search algorithm has been assessed on 18 real-world datasets comprising 148 up to 4,843 DNA sequences. We then present the implementation, optimization, and evaluation of RAxML on the IBM Cell Broadband Engine. We address the problems and provide solutions pertaining to the optimization of floating point code, control flow, communication, and scheduling of multi-level parallelism on the Cel
Revisiting the Speed-versus-Sensitivity Tradeoff in Pairwise Sequence Search
The Smith-Waterman algorithm is a dynamic programming method for determining optimal local alignments between nucleotide or protein sequences. However, it suffers from quadratic time and space complexity. As a result, many algorithmic and architectural enhancements have been proposed to solve this problem, but at the cost of reduced sensitivity in the algorithms or significant expense in hardware, respectively. Hence, there exists a need to evaluate the tradeoffs between the different solutions. This motivation, coupled with the lack of an evaluation metric to quantify these tradeoffs leads us to formally define and quantify the sensitivity of homology search methods so that tradeoffs between sequence-search solutions can be evaluated in a quantitative manner. As an example, though the BLAST algorithm executes significantly faster than Smith-Waterman, we find that BLAST misses 80% of the significant sequence alignments. This paper then presents a highly efficient parallelization of the Smith-Waterman algorithm on the Cell Broadband Engine, a novel hybrid multicore architecture that drives the PlayStation 3 (PS3) game consoles, and emulates BLAST by repeatedly executing the parallelized Smith-Waterman algorithm to search for a query in a given sequence database. Through an innovative mapping of the optimal Smith-Waterman algorithm onto a cluster of PlayStation 3 nodes, our implementation delivers a 10-fold speed-up over a high-end multicore architecture and an 88-fold speed-up over a non-accelerated PS3. Finally, we compare the performance of our implementation of the Smith-Waterman algorithm to that of BLAST and the canonical Smith-Waterman implementation, based on a combination of three factors — execution time (speed), sensitivity, and the actual cost of de-ploying each solution. In the end, our parallelized Smith-Waterman algorithm approaches the speed of BLAST while maintaining ideal sensitivity and achieving low cost through the use of PlayStation 3 game consoles
PLAST: parallel local alignment search tool for database comparison
Background: Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results: A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusions: A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.
Accelerated large-scale multiple sequence alignment
<p>Abstract</p> <p>Background</p> <p>Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.</p> <p>Results</p> <p>We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.</p> <p>Conclusions</p> <p>Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p
- …