342 research outputs found
VLSI Architectures for WIMAX Channel Decoders
This chapter describes the main architectures proposed in the literature to
implement the channel decoders required by the WiMax standard, namely
convolutional codes, turbo codes (both block and convolutional) and LDPC. Then
it shows a complete design of a convolutional turbo code encoder/decoder system
for WiMax.Comment: To appear in the book "WIMAX, New Developments", M. Upena, D. Dalal,
Y. Kosta (Ed.), ISBN978-953-7619-53-
An Approach for Effective Design Space Exploration of Hard-Decision Viterbi Decoder: Algorithm and VLSI Implementation
Viterbi algorithmic rule is usually used as a cryptography technique for convolutional codes, bit detection technique, Trellis in storage devices. The design space for VLSI implementation of Viterbi decoders is massive, involving selections of turnout, latency, area and power. Even for a set of parameters like constraint length, encoder polynomials and trace-back depth, the task of de-signing a Viterbi decoder is kind of troublesome and needs important effort. Sometimes, as a result of incomplete style area exploration or incorrect analysis, a suboptimal style is chosen. This work analyzes the planning complexness by applying most of the identified VLSI implementation techniques for hard-decision Viterbi cryptography to a distinct set of code parameters. The conclusions square measure supported real styles that actual synthesis and layouts were obtained. In authors’ read, as a result of the depth lined, it is the foremost comprehensive analysis of the subject revealed to this point
A hardware spinal decoder
Spinal codes are a recently proposed capacity-achieving rateless code. While hardware encoding of spinal codes is straightforward, the design of an efficient, high-speed hardware decoder poses significant challenges. We present the first such decoder. By relaxing data dependencies inherent in the classic M-algorithm decoder, we obtain area and throughput competitive with 3GPP turbo codes as well as greatly reduced latency and complexity. The enabling architectural feature is a novel alpha-beta incremental approximate selection algorithm. We also present a method for obtaining hints which anticipate successful or failed decoding, permitting early termination and/or feedback-driven adaptation of the decoding parameters.
We have validated our implementation in FPGA with on-air testing. Provisional hardware synthesis suggests that a near-capacity implementation of spinal codes can achieve a throughput of 12.5 Mbps in a 65 nm technology while using substantially less area than competitive 3GPP turbo code implementations.Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipIntel Corporation (Fellowship)Claude E. Shannon Research Assistantshi
Ultra low-power, high-performance accelerator for speech recognition
Automatic Speech Recognition (ASR) is undoubtedly one of the most important and interesting applications in the cutting-edge era of Deep-learning deployment, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost, requiring huge memory storage and computational power, which is not affordable for the tiny power budget of mobile devices.
Hardware acceleration can reduce power consumption of ASR systems as well as reducing its memory pressure, while delivering high-performance.
In this thesis, we present a customized accelerator for large-vocabulary, speaker-independent, continuous speech recognition. A state-of-the-art ASR system consists of two major components: acoustic-scoring using DNN and speech-graph decoding using Viterbi search. As the first step, we focus on the Viterbi search algorithm, that represents the main bottleneck in the ASR system.
The accelerator includes some innovative techniques to improve the memory subsystem, which is the main bottleneck for performance and power, such as a prefetching scheme and a novel bandwidth saving technique tailored to the needs of ASR.
Furthermore, as the speech graph is vast taking more than 1-Gigabyte memory space, we propose to change its representation by partitioning it into several sub-graphs and perform an on-the-fly composition during the Viterbi run-time. This approach together with some simple yet efficient compression techniques result in 31x memory footprint reduction, providing 155x real-time speedup and orders of magnitude power and energy saving compared to CPUs and GPUs.
In the next step, we propose a novel hardware-based ASR system that effectively integrates a DNN accelerator for the pruned/quantized models with the Viterbi accelerator. We show that, when either pruning or quantizing the DNN model used for acoustic scoring, ASR accuracy is maintained but the execution time of the ASR system is increased by 33%. Although pruning and quantization improves the efficiency of the DNN, they result in a huge increase of activity in the Viterbi search since the output scores of the pruned model are less reliable. In order to avoid the aforementioned increase in Viterbi search workload, our system loosely selects the N-best hypotheses at every time step, exploring only the N most likely paths. Our final solution manages to efficiently combine both DNN and Viterbi accelerators using all their optimizations, delivering 222x real-time ASR with a small power budget of 1.26 Watt, small memory footprint of 41 MB, and a peak memory bandwidth of 381 MB/s, being amenable for low-power mobile platforms.Los sistemas de reconocimiento automático del habla (ASR por sus siglas en inglés, Automatic Speech Recognition) son sin lugar a dudas una de las aplicaciones más relevantes en el área emergente de aprendizaje profundo (Deep Learning), specialmente en el segmento de los dispositivos móviles. Realizar el reconocimiento del habla de forma rápida y precisa tiene un elevado coste en energÃa, requiere de gran capacidad de memoria y de cómputo, lo cual no es deseable en sistemas móviles que tienen severas restricciones de consumo energético y disipación de potencia. El uso de arquitecturas especÃficas en forma de aceleradores hardware permite reducir el consumo energético de los sistemas de reconocimiento del habla, al tiempo que mejora el rendimiento y reduce la presión en el sistema de memoria. En esta tesis presentamos un acelerador especÃficamente diseñado para sistemas de reconocimiento del habla de gran vocabulario, independientes del orador y que funcionan en tiempo real. Un sistema de reconocimiento del habla estado del arte consiste principalmente en dos componentes: el modelo acústico basado en una red neuronal profunda (DNN, Deep Neural Network) y la búsqueda de Viterbi basada en un grafo que representa el lenguaje. Como primer objetivo nos centramos en la búsqueda de Viterbi, ya que representa el principal cuello de botella en los sistemas ASR. El acelerador para el algoritmo de Viterbi incluye técnicas innovadoras para mejorar el sistema de memoria, que es el mayor cuello de botella en rendimiento y energÃa, incluyendo técnicas de pre-búsqueda y una nueva técnica de ahorro de ancho de banda a memoria principal especÃficamente diseñada para sistemas ASR. Además, como el grafo que representa el lenguaje requiere de gran capacidad de almacenamiento en memoria (más de 1 GB), proponemos cambiar su representación y dividirlo en distintos grafos que se componen en tiempo de ejecución durante la búsqueda de Viterbi. De esta forma conseguimos reducir el almacenamiento en memoria principal en un factor de 31x, alcanzar un rendimiento 155 veces superior a tiempo real y reducir el consumo energético y la disipación de potencia en varios órdenes de magnitud comparado con las CPUs y las GPUs. En el siguiente paso, proponemos un novedoso sistema hardware para reconocimiento del habla que integra de forma efectiva un acelerador para DNNs podadas y cuantizadas con el acelerador de Viterbi. Nuestros resultados muestran que podar y/o cuantizar el DNN para el modelo acústico permite mantener la precisión pero causa un incremento en el tiempo de ejecución del sistema completo de hasta el 33%. Aunque podar/cuantizar mejora la eficiencia del DNN, éstas técnicas producen un gran incremento en la carga de trabajo de la búsqueda de Viterbi ya que las probabilidades calculadas por el DNN son menos fiables, es decir, se reduce la confianza en las predicciones del modelo acústico. Con el fin de evitar un incremento inaceptable en la carga de trabajo de la búsqueda de Viterbi, nuestro sistema restringe la búsqueda a las N hipótesis más probables en cada paso de la búsqueda. Nuestra solución permite combinar de forma efectiva un acelerador de DNNs con un acelerador de Viterbi incluyendo todas las optimizaciones de poda/cuantización. Nuestro resultados experimentales muestran que dicho sistema alcanza un rendimiento 222 veces superior a tiempo real con una disipación de potencia de 1.26 vatios, unos requisitos de memoria modestos de 41 MB y un uso de ancho de banda a memoria principal de, como máximo, 381 MB/s, ofreciendo una solución adecuada para dispositivos móviles
Hybrid ARQ with parallel and serial concatenated convolutional codes for next generation wireless communications
This research focuses on evaluating the currently used FEC encoding-decoding schemes and improving the performance of error control systems by incorporating these schemes in a hybrid FEC-ARQ environment. Beginning with an overview of wireless communications and the various ARQ protocols, the thesis provides an in-depth explanation of convolutional encoding and Viterbi decoding, turbo (PCCC) and serial concatenated convolutional (SCCC) encoding with their respective MAP decoding strategies.;A type-II hybrid ARQ scheme with SCCCs is proposed for the first time and is a major contribution of this thesis. A vast improvement is seen in the BER performance of the successive individual FEC schemes discussed above. Also, very high throughputs can be achieved when these schemes are incorporated in an adaptive type-II hybrid ARQ system.;Finally, the thesis discusses the equivalence of the PCCCs and the SCCCs and proposes a technique to generate a hybrid code using both schemes
DESIGN OF HIGH SPEED &LOW POWER VITERBI DECODER FOR TREELIS CODED MODULATION(TCM)
High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module determining the overall power consumption of trellis coded modulation decoders. We propose a pre-computation architecture incorporated with T-algorithm for Viterbi decoder, which can effectively reduce the power consumption without degrading the decoding speed much. A general solution to derive the optimal pre-computation steps is also given in the paper. Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that compared with the full trellis VD, the pre-computation architecture reduces the power consumption by as much as 70% without performance loss, while the degradation in clock speed is negligible
Recommended from our members
Efficient decoder design for error correction codes
Error correction codes (ECCs) have been widely used in communication systems and storage devices. Nowadays, the rapid development of integrated circuit technologies makes feasible the implementation of powerful ECCs such as turbo code and low-density parity-check (LDPC) code. However, these high-performance codes require complex decoding algorithms, resulting in large hardware area and high power consumption. Furthermore, some of these decoders require an iterative decoding process, which leads to a long decoding latency. Therefore, low-complexity, low-power and high-speed very-large-scale integration (VLSI) architecture design for the ECC decoder is of great importance. This dissertation focuses on efficient VLSI implementation for the decoders of convolutional codes and two advanced coding schemes based on convolutional code: trellis-coded modulation (TCM) and convolutional turbo code (CTC).
The first part of this dissertation is dedicated to low-complexity, low-power decoders design for a 4-dimensional, 8-ary phase-shift keying (4-D 8PSK) TCM system. We propose a low-complexity architecture for the transition-metric unit (TMU) to reduce the hardware area without performance loss. Then, a power-efficient scheme by applying T-algorithm on branch metrics (BMs) is proposed for the Viterbi decoder (VD) embedded in the 4-D 8PSK TCM decoder. Unlike the conventional T-algorithm, the proposed scheme does not affect the clock speed of the decoder. Finally, a hybrid T-algorithm is developed by applying T-algorithm on both BMs and path metrics (PMs), which reduces significantly more computations than the conventional T-algorithm applied on PMs.
The VLSI design for VDs has been an active research area for decades. In the second part of the dissertation, we extend our research to a more general topic of VDs, where novel architectures are explored to efficiently reduce the power consumption, while still maintaining a high decoding speed and a low decoding latency.
CTCs are constructed from parallel convolutional encoding of the same message in different sequences and have the error-correcting capability near the Shannon bound. Practical decoding schemes normally require an iterative decoding process employing the soft-in soft-out (SISO) decoder. The third part of this dissertation is focused on the SISO decoder design for double-binary (DB) CTCs. We propose a low-complexity, memory-reduced architecture by partitioning BMs into two independent portions: information metrics and parity metrics. Furthermore, high-speed recursion architectures for logarithm domain maximum a posteriori probability (log-MAP) algorithm are proposed to increase the decoding speed by algorithmic approximation and bit-level optimization
Recommended from our members
Modified VLSI designs for error correction codes
Nowadays, error correction codes have become an integral part in almost all the modern digital communication and storage systems. With the continuously increasing demands for higher speed and lower power communication systems, efficient VLSI implementations of those error correction codes have great importance for practical applications. In this thesis, several VLSI design issues for Viterbi decoder and Low-Density Parity-Check (LDPC) codes decoder will be discussed. We propose a low-power memory-efficient Viterbi decoder to reduce the memory read operations in the survivor memory unit (SMU) and the memory size of SMU. We develop a parallel Viterbi decoder for high throughput applications. We also propose an efficient early stopping scheme to reduce the number of decoding iterations for LDPC codes decoding
- …