Search CORE

4,132 research outputs found

An ultra low-power hardware accelerator for automatic speech recognition

Author: Arnau Montañés José María
González Colás Antonio María
Segura Salvador Albert
Yazdani Aminabadi Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can reduce power consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech recognition. It focuses on the Viterbi search algorithm, that represents the main bottleneck in an ASR system. The proposed design includes innovative techniques to improve the memory subsystem, since memory is identified as the main bottleneck for performance and power in the design of these accelerators. We propose a prefetching scheme tailored to the needs of an ASR system that hides main memory latency for a large fraction of the memory accesses with a negligible impact on area. In addition, we introduce a novel bandwidth saving technique that removes 20% of the off-chip memory accesses issued during the Viterbi search. The proposed design outperforms software implementations running on the CPU by orders of magnitude and achieves 1.7x speedup over a highly optimized CUDA implementation running on a high-end Geforce GTX 980 GPU, while reducing by two orders of magnitude (287x) the energy required to convert the speech into text.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Privacy Leakages in Approximate Adders

Author: Holcomb Daniel
Keshavarz Shahrzad
Publication venue
Publication date: 24/02/2018
Field of study

Approximate computing has recently emerged as a promising method to meet the low power requirements of digital designs. The erroneous outputs produced in approximate computing can be partially a function of each chip's process variation. We show that, in such schemes, the erroneous outputs produced on each chip instance can reveal the identity of the chip that performed the computation, possibly jeopardizing user privacy. In this work, we perform simulation experiments on 32-bit Ripple Carry Adders, Carry Lookahead Adders, and Han-Carlson Adders running at over-scaled operating points. Our results show that identification is possible, we contrast the identifiability of each type of adder, and we quantify how success of identification varies with the extent of over-scaling and noise. Our results are the first to show that approximate digital computations may compromise privacy. Designers of future approximate computing systems should be aware of the possible privacy leakages and decide whether mitigation is warranted in their application.Comment: 2017 IEEE International Symposium on Circuits and Systems (ISCAS

arXiv.org e-Print Archive

Crossref

Swirling combustor energy converter: H2/air simulations of separated chambers

Author: MINOTTI ANGELO
TEOFILATTO Paolo
Publication venue: 'MDPI AG'
Publication date: 01/01/2015
Field of study

This work reports results related to the “EU-FP7-HRC-Power” project aiming at developing micro-meso hybrid sources of power. One of the goals of the project is to achieve surface temperatures up to more than 1000 K, with a ΔT ≤ 100 K, in order to be compatible with a thermal/electrical conversion by thermo-photovoltaic cells. The authors investigate how to reach that goal adopting swirling chambers integrated in a thermally-conductive and emitting element. The converter consists of a small parallelepiped brick inside two separated swirling meso-combustion chambers, which heat up the parallelepiped, emitting material by the combustion of H2 and air at ambient pressure. The overall dimension is of the order of cm. Nine combustion simulations have been carried out assuming detailed chemistry, several length/diameter ratios (Z/D = 3, 5 and 11) and equivalence ratios (0.4, 0.7 and 1); all are at 400 W of injected chemical power. Among the most important results are the converter surfaces temperatures, the heat loads, provided to the environment, and the chemical efficiency. The high chemical efficiency, > 99.9%, is due to the relatively long average gas residence time coupled with the fairly good mixing due to the swirl motion and the impinging air/fuel jets that provide heat and radicals to the flame

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

A low-power, high-performance speech recognition accelerator

Author: Arnau Montañés José María
González Colás Antonio María
Yazdani Reza
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC