Search CORE

266 research outputs found

Systolic arrays and stack decoding

Author: Shahshahani M.
Publication venue
Publication date
Field of study

The application of systolic priority queues to the sequential stack decoding algorithm is discussed in a review of the work of K. Yao and C. Y. Chang. Using a systolic array architecture, one can significantly improve the performance of such algorithms at high signal-to-noise ratios

NASA Technical Reports Server

A survey of the state-of-the-art and focused research in range systems

Author: Balakrishnan A. V.
Kung Yao
Publication venue
Publication date
Field of study

In this one-year renewal of NASA Contract No. 2-304, basic research, development, and implementation in the areas of modern estimation algorithms and digital communication systems have been performed. In the first area, basic study on the conversion of general classes of practical signal processing algorithms into systolic array algorithms is considered, producing four publications. Also studied were the finite word length effects and convergence rates of lattice algorithms, producing two publications. In the second area of study, the use of efficient importance sampling simulation technique for the evaluation of digital communication system performances were studied, producing two publications

NASA Technical Reports Server

A VLSI design for a systolic Viterbi decoder

Author: Reed I. S.
Satorius E.
Shih M. T.
Truong T. K.
Publication venue
Publication date
Field of study

A systolic Viterbi decoder for convolutional codes is developed. This decoder uses the trace-back method to reduce the amount of data needed to be stored in registers. It is shown that this new algorithm requires a smaller chip size and achieves a faster decoding time than other existing methods

NASA Technical Reports Server

A parallel Viterbi decoder for block cyclic and convolution codes

Author: Amarasinghe Kosala
Reeve Jeffrey
Publication venue
Publication date: 01/01/2006
Field of study

We present a parallel version of Viterbi's decoding procedure, for which we are able to demonstrate that the resultant task graph has restricted complexity in that the number of communications to or from any processor cannot exceed 4 for BCH codes. The resulting algorithm works in lock step making it suitable for implementation on a systolic processor array, which we have implemented on a field programmable gate array and demonstrate the perfect scaling of the algorithm for two exemplar BCH codes. The parallelisation strategy is applicable to all cyclic codes and convolution codes. We also present a novel method for generating the state transition diagrams for these codes

CiteSeerX

Southampton (e-Prints Soton)

A Viterbi decoder with low-power trace-back memory structure for wireless pervasive communications

Author: Israsena P.
Israsena P.
Kale I.
Kale I.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper presents a new trace-back memory structure for Viterbi decoders that reduces power consumption by 63% compared to the conventional RAM based design. Instead of the intensive read and write operations as required in RAM based designs, the new memory is based on an array of registers connected with trace-back signals that decode the output bits on the fly. The structure is used together with appropriate clock and power-aware control signals. Based on a 0.35 /spl mu/m CMOS implementation the trace-back back memory consumes energy of 182 pJ

Crossref

WestminsterResearch

A protein sequence analysis hardware accelerator based on divergences

Author: Eusse Juan Fernando
Jacobi Ricardo Pezzuol
Melo Alba Cristina Magalhães Alves de
Moreano Nahri
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

The Viterbi algorithm is one of the most used dynamic programming algorithms for protein comparison and identification, based on hidden markov Models (HMMs). Most of the works in the literature focus on the implementation of hardware accelerators that act as a prefilter stage in the comparison process. This stage discards poorly aligned sequences with a low similarity score and forwards sequences with good similarity scores to software, where they are reprocessed to generate the sequence alignment. In order to reduce the software reprocessing time, this work proposes a hardware accelerator for the Viterbi algorithm which includes the concept of divergence, in which the region of interest of the dynamic programming matrices is delimited. We obtained gains of up to 182x when compared to unaccelerated software. The performance measurement methodology adopted in this work takes into account not only the acceleration achieved by the hardware but also the reprocessing software stage required to generate the alignment

Repositório Institucional da Universidade de Brasília

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Silicon Technologies for Speaker Independent Speech Processing and Recognition Systems in Noisy Environments

Author: Arun Selvaraj
Karthikeyan Natarajan
Mala John
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Author index

Author
Publication venue: Published by Elsevier Ltd.
Publication date
Field of study

Elsevier - Publisher Connector

Acceleration of Profile-HMM Search for Protein Sequences in Reconfigurable Hardware - Master\u27s Thesis, May 2006

Author: Maddimsetty Rahul Pratap
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

Profile Hidden Markov models are highly expressive representations of functional units, or motifs, conserved across protein sequences. Profile-HMM search is a powerful computational technique that is used to annotate new sequences by identifying occurrences of known motifs in them. With the exponential growth of protein databases, there is an increasing demand for acceleration of such techniques. We describe an accelerator for the Viterbi algorithm using a two-stage pipelined design in which the first stage is implemented in parallel reconfigurable hardware for greater speedup. To this end, we identify algorithmic modifications that expose a high level of parallelism and characterize their impact on the accuracy and performance relative to a standard software implementation. We develop a performance model to evaluate any accelerator design and propose two alternative architectures that recover the accuracy lost by a basic architecture. We compare the performance of the two architectures to show that speedups of up to 3 orders of magnitude may be achieved. We also investigate the use of the Forward algorithm in the first pipeline stage of the accelerator using floating-point arithmetic and report its accuracy and performance

Washington University St. Louis: Open Scholarship

Parallelization of dynamic programming recurrences in computational biology

Author: Jacob Arpith
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

Washington University St. Louis: Open Scholarship