6 research outputs found
Parallelization of dynamic programming recurrences in computational biology
The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
Recommended from our members
Mapping of recursive algorithms onto multi-rate arrays
In this dissertation, multi-rate array (MRA) architecture and its synthesis are proposed
and developed. Using multi-coordinate systems (MCS), a unified theory for mapping
algorithms from their original algorithmic specifications onto multi-rate arrays is
developed.
A multi-rate array is a grid of processors in which each interconnection may have its
own clock rate; operations with different complexities run at their own clock rate, thus
increasing the throughput and efficiency.
A class of algorithms named directional affine recurrence equations (DARE) is
defined. The dependence space of a DARE can be decomposed into uniform and non-uniform
subspaces. When projected along the non-uniform subspace, the resultant array
structure is regular. Limitations and restrictions of this approach are investigated and a
procedure for mapping DARE onto MRA is developed.
To generalize this approach, synthesis theory is developed with initial specification
as affine direct input output (ADIO) which aims at removing redundancies from algorithms.
Most ADIO specifications are the original algorithmic specifications. A multi-coordinate
systems (MCS) is used to present an algorithm's dependence structures. In a
MCS system, the index spaces of the variables in an algorithm are defined relative to their own coordinate systems. Most traditionally considered irregular algorithms present regular dependence structures under MCS technique. Procedures are provided for transforming algorithms from original algorithmic specifications to their regular specifications.
Multi-rate schedules and multi-rate timing functions are studied. The solution for multi-rate timing functions can be formulated as linear programming problems. Procedures are provided for mapping ADIOs onto multi-rate VLSI systems. Examples are provided to illustrate the synthesis of MRAs from DAREs and ADIOs.
The first major contribution of this dissertation is the development of the concrete, executable MRA architectures. The second is the introduction of MCS system and its application in the development of the theory for synthesizing MRAs from original algorithmic specifications