Search CORE

136 research outputs found

Recommended from our members

AdaStreamLite: Environment-adaptive Streaming Speech Recognition on Mobile Devices

Author: Du Junzhao
Liu Hui
Pan Jiangtao
Wei Yuheng
Xiong Jie
Yu Yingtao
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2024
Field of study

Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments

ScholarWorks@UMass Amherst

On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition

Author: Alsharif Ouais
Bruguier Antoine
McGraw Ian
Prabhavalkar Rohit
Publication venue
Publication date: 02/05/2016
Field of study

We study the problem of compressing recurrent neural networks (RNNs). In particular, we focus on the compression of RNN acoustic models, which are motivated by the goal of building compact and accurate speech recognition systems which can be run efficiently on mobile devices. In this work, we present a technique for general recurrent model compression that jointly compresses both recurrent and non-recurrent inter-layer weight matrices. We find that the proposed technique allows us to reduce the size of our Long Short-Term Memory (LSTM) acoustic model to a third of its original size with negligible loss in accuracy.Comment: Accepted in ICASSP 201

arXiv.org e-Print Archive

Crossref

On the efficient representation and execution of deep acoustic models

Author: Alvarez Raziel
Bakhtin Anton
Prabhavalkar Rohit
Publication venue
Publication date: 16/12/2016
Field of study

In this paper we present a simple and computationally efficient quantization scheme that enables us to reduce the resolution of the parameters of a neural network from 32-bit floating point values to 8-bit integer values. The proposed quantization scheme leads to significant memory savings and enables the use of optimized hardware instructions for integer arithmetic, thus significantly reducing the cost of inference. Finally, we propose a "quantization aware" training process that applies the proposed scheme during network training and find that it allows us to recover most of the loss in accuracy introduced by quantization. We validate the proposed techniques by applying them to a long short-term memory-based acoustic model on an open-ended large vocabulary speech recognition task.Comment: Accepted conference paper: "The Annual Conference of the International Speech Communication Association (Interspeech), 2016

arXiv.org e-Print Archive

Crossref