A programmable accelerator for streaming automatic speech recognition on edge devices

Abstract

Automatic Speech Recognition (ASR) is quickly becoming a mainstream technology, mainly driven by the outstanding accuracy achieved by modern systems based on machine learning. However, these systems often require billions of arithmetic operations to decode a second of audio and relying on cloud services for ASR is usually inconvenient. Even though deployment of ASR systems directly on the edge is highly desirable, the requirements for high performance and low energy consumption, combined with the fast pace of evolution and heterogeneity of existing ASR systems, result in challenges for effective deployment of ASR on edge devices. In this work, we propose a programmable accelerator to efficiently support a variety of ASR implementations. We estimate the performance of our system by implementing a recently proposed streaming ASR system and show that it can perform real-time streaming decoding with a tight power budget and low area footprint while offering great flexibility to implement a variety of different models.This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and the Spanish MICINN Ministry under grant BES-2017-080605.Peer ReviewedPostprint (published version

    Similar works