Search CORE

600 research outputs found

A low-power, high-performance speech recognition accelerator

Author: Arnau Montañés José María
González Colás Antonio María
Yazdani Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

VLSI Architecture for Configurable and Low-Complexity Design of Hard-Decision Viterbi Decoding Algorithm

Author: Adiono Trio
Putra Rachmad Vidya Wicaksana
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/06/2016
Field of study

Convolutional encoding and data decoding are fundamental processes in convolutional error correction. One of the most popular error correction methods in decoding is the Viterbi algorithm. It is extensively implemented in many digital communication applications. Its VLSI design challenges are about area, speed, power, complexity and configurability. In this research, we specifically propose a VLSI architecture for a configurable and low-complexity design of a hard-decision Viterbi decoding algorithm. The configurable and low-complexity design is achieved by designing a generic VLSI architecture, optimizing each processing element (PE) at the logical operation level and designing a conditional adapter. The proposed design can be configured for any predefined number of trace-backs, only by changing the trace-back parameter value. Its computational process only needs N + 2 clock cycles latency, with N is the number of trace-backs. Its configurability function has been proven for N = 8, N = 16, N = 32 and N = 64. Furthermore, the proposed design was synthesized and evaluated in Xilinx and Altera FPGA target boards for area consumption and speed performance

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

An ultra low-power hardware accelerator for automatic speech recognition

Author: Arnau Montañés José María
González Colás Antonio María
Segura Salvador Albert
Yazdani Aminabadi Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

On the Single Chip Implementation of a Hiperlan/2 and IEEE802.11a Capable Modem

Author: Dombrowski Kai
Fiebig Norbert
Grass Eckhard
Jagdhold Ulrich
Kraemer Rolf
Krueger Olaf
Lehman Jens
Lippert Gunther
Maehoenen Patrick
Maharatna Koushik
Tittelbach Klaus
Troya Alfonso
Publication venue
Publication date: 01/01/2001
Field of study

Southampton (e-Prints Soton)

Explore Bristol Research

Reconfigurable architectures for beyond 3G wireless communication systems

Author: Zhan Cheng
Publication venue: The University of Edinburgh
Publication date: 01/01/2007
Field of study

Edinburgh Research Archive

Optimization and implementation of a Viterbi decoder under flexibility constraints

Author: Anderson John B
Kamuf Matthias
Öwall Viktor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper discusses the impact of flexibility when designing a Viterbi decoder for both convolutional and TCM codes. Different trade-offs have to be considered in choosing the right architecture for the processing blocks and the resulting hardware penalty is evaluated. We study the impact of symbol quantization that degrades performance and affects the wordlength of the rate-flexible trellis datapath. A radix-2-based architecture for this datapath relaxes the hardware requirements on the branch metric and survivor path blocks substantially. The cost of flexibility in terms of cell area and power consumption is explored by an investigation of synthesized designs that provide different transmission rates. Two designs are fabricated in a digital 0.13-

mu{hbox {m}}

CMOS process. Based on post-layout simulations, a symbol baud rate of 168 Mbaud/s is achieved in TCM mode, equivalent to a maximum throughput of 840 Mbit/s using a 64-QAM constellation

Lund University Publications

Turbo codes for multi-hop wireless sensor networks with decode-and-forward mechanism

Author
Publication venue: Springer
Publication date: 29/11/2014
Field of study

Springer - Publisher Connector

KNN-Based ML Model for the Symbol Prediction in TCM Trellis Coded Modulation TCM Decoder

Author: Gawande G.S.
Kanwade Archana B.
Panwar Sarika A.
Sardey Mohini
Thune Neeta N.
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Machine Learning is a booming technology today. In a machine learning set of training, data is to be provided to the model for training and that model predicts the output. Machine Learning models are trained using a computer program known as ML algorithms.The new machine learning-based Transition Metric Unit (TMU) of 4D- 8PSK Trellis coded Modulation TCM Decoder is presented in this work. The classic Viterbi decoder's branch metric unit, or TMU, takes on a complex structure. Trellis coded Modulation (TCM) is a combination of 8 PSK modulations and Error Correcting Code (ECC). TMU is one of the complex units of the TCM decoder, which is essentially a Viterbi decoder. Similar to how the first Branch metric is determined in the straightforward Viterbi decoder, the TCM decoder performs this BM computation via the TMU unit. The TMU becomes challenging and uses more dynamic power as a result of the enormous constraint length and the vast number of encoder states.In the proposed algorithm innovative KNN (K nearest neighbours) based ML model is developed. It is a supervised learning model in which input and output both are provided to the model, training data also called the labels, when a new set of data will come the model will give output based on its previous set experience and data.Here we are using this ML model for the symbol prediction at the receiver end of the TCM decoder based on the previous learning. Using the proposed innovation, the paper perceives the optimization of the TCM Decoder which will further reduce the H/W requirements and low latency which results in less power consumption

International Journal on Recent and Innovation Trends in Computing and Communication

Characterization of Speech Recognition Systems on GPU Architectures

Author: Segura Salvador Albert
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

This master thesis characterizes the performance and energy bottlenecks of speech recognition systems when running on modern GPU, with the aim of providing useful information for designing future GPU architectures, as well as proposing a GPU configuration more well-suited for speech recognition

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Porting the digital radio mondiale receiver on the Ericsson M7400 platform

Author: Liu Y.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository