1,030 research outputs found
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Parallelization of dynamic programming recurrences in computational biology
The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms
Sequence information signal processor for local and global string comparisons
A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure
The instruction of systolic array (ISA) and simulation of parallel algorithms
Systolic arrays have proved to be well suited for Very Large
Scale Integrated technology (VLSI) since they:
-Consist of a regular network of simple processing cells,
-Use local communication between the processing cells only,
-Exploit a maximal degree of parallelism.
However, systolic arrays have one main disadvantage compared with
other parallel computer architectures: they are special purpose
architectures only capable of executing one algorithm, e.g., a
systolic array designed for sorting cannot be used to form matrix
multiplication.
Several approaches have been made to make systolic arrays more
flexible, in order to be able to handle different problems on a
single systolic array.
In this thesis an alternative concept to a VLSI-architecture
the Soft-Systolic Simulation System (SSSS), is introduced and
developed as a working model of virtual machine with the power to
simulate hard systolic arrays and more general forms of concurrency
such as the SIMD and MIMD models of computation.
The virtual machine includes a processing element consisting of
a soft-systolic processor implemented in the virtual.machine language.
The processing element considered here was a very general element
which allows the choice of a wide range of arithmetic and logical
operators and allows the simulation of a wide class of algorithms
but in principle extra processing cells can be added making a library
and this library be tailored to individual needs.
The virtual machine chosen for this implementation is the
Instruction Systolic Array (ISA). The ISA has a number of interesting
features, firstly it has been used to simulate all SIMD algorithms
and many MIMD algorithms by a simple program transformation technique,
further, the ISA can also simulate the so-called wavefront processor
algorithms, as well as many hard systolic algorithms. The ISA removes
the need for the broadcasting of data which is a feature of SIMD
algorithms (limiting the size of the machine and its cycle time) and also presents a fairly simple communication structure for MIMD
algorithms.
The model of systolic computation developed from the VLSI
approach to systolic arrays is such that the processing surface is
fixed, as are the processing elements or cells by virtue of their
being embedded in the processing surface.
The VLSI approach therefore freezes instructions and hardware
relative to the movement of data with the virtual machine and softsystolic
programming retaining the constructions of VLSI for array
design features such as regularity, simplicity and local communication,
allowing the movement of instructions with respect to data. Data can
be frozen into the structure with instructions moving systolically.
Alternatively both the data and instructions can move systolically
around the virtual processors, (which are deemed fixed relative to
the underlying architecture).
The ISA is implemented in OCCAM programs whose execution and
output implicitly confirm the correctness of the design.
The soft-systolic preparation comprises of the usual operating
system facilities for the creation and modification of files during
the development of new programs and ISA processor elements. We allow
any concurrent high level language to be used to model the softsystolic
program. Consequently the Replicating Instruction Systolic
Array Language (RI SAL) was devised to provide a very primitive program
environment to the ISA but adequate for testing. RI SAL accepts
instructions in an assembler-like form, but is fairly permissive
about the format of statements, subject of course to syntax.
The RI SAL compiler is adopted to transform the soft-systolic
program description (RISAL) into a form suitable for the virtual
machine (simulating the algorithm) to run.
Finally we conclude that the principles mentioned here can form
the basis for a soft-systolic simulator using an orthogonally
connected mesh of processors. The wide range of algorithms which the
ISA can simulate make it suitable for a virtual simulating grid
- …