Search CORE

6 research outputs found

Designing parallel programs of parameterized granularity

Author: Struik P.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1992
Field of study

Repository TU/e

Pure OAI Repository

Formal process for systolic array design using recurrences

Author: Puddicombe Jonathan
Publication venue: The University of Edinburgh
Publication date: 01/01/1992
Field of study

Edinburgh Research Archive

Formal synthesis of control signals for systolic arrays

Author: Xue Jingling
Publication venue: The University of Edinburgh
Publication date: 01/01/1992
Field of study

Edinburgh Research Archive

Parallelization of dynamic programming recurrences in computational biology

Author: Jacob Arpith
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

Washington University St. Louis: Open Scholarship

Recommended from our members

Mapping of recursive algorithms onto multi-rate arrays

Author: Zheng Yue-Peng
Publication venue: 'Oregon State University'
Publication date
Field of study

In this dissertation, multi-rate array (MRA) architecture and its synthesis are proposed and developed. Using multi-coordinate systems (MCS), a unified theory for mapping algorithms from their original algorithmic specifications onto multi-rate arrays is developed. A multi-rate array is a grid of processors in which each interconnection may have its own clock rate; operations with different complexities run at their own clock rate, thus increasing the throughput and efficiency. A class of algorithms named directional affine recurrence equations (DARE) is defined. The dependence space of a DARE can be decomposed into uniform and non-uniform subspaces. When projected along the non-uniform subspace, the resultant array structure is regular. Limitations and restrictions of this approach are investigated and a procedure for mapping DARE onto MRA is developed. To generalize this approach, synthesis theory is developed with initial specification as affine direct input output (ADIO) which aims at removing redundancies from algorithms. Most ADIO specifications are the original algorithmic specifications. A multi-coordinate systems (MCS) is used to present an algorithm's dependence structures. In a MCS system, the index spaces of the variables in an algorithm are defined relative to their own coordinate systems. Most traditionally considered irregular algorithms present regular dependence structures under MCS technique. Procedures are provided for transforming algorithms from original algorithmic specifications to their regular specifications. Multi-rate schedules and multi-rate timing functions are studied. The solution for multi-rate timing functions can be formulated as linear programming problems. Procedures are provided for mapping ADIOs onto multi-rate VLSI systems. Examples are provided to illustrate the synthesis of MRAs from DAREs and ADIOs. The first major contribution of this dissertation is the development of the concrete, executable MRA architectures. The second is the introduction of MCS system and its application in the development of the theory for synthesizing MRAs from original algorithmic specifications

ScholarsArchive@OSU