42 research outputs found
On the efficient representation and execution of deep acoustic models
In this paper we present a simple and computationally efficient quantization
scheme that enables us to reduce the resolution of the parameters of a neural
network from 32-bit floating point values to 8-bit integer values. The proposed
quantization scheme leads to significant memory savings and enables the use of
optimized hardware instructions for integer arithmetic, thus significantly
reducing the cost of inference. Finally, we propose a "quantization aware"
training process that applies the proposed scheme during network training and
find that it allows us to recover most of the loss in accuracy introduced by
quantization. We validate the proposed techniques by applying them to a long
short-term memory-based acoustic model on an open-ended large vocabulary speech
recognition task.Comment: Accepted conference paper: "The Annual Conference of the
International Speech Communication Association (Interspeech), 2016
Android Based Smart Speech Recognition Application to Perform Various Tasks
A smart speech recognition application is theorized in this paper. User can control a variety of applications on an android based platform, which include native applications as well as user installed applications with voice commands. These include - calling, texting, switching on and off sensors (Wi-Fi, GPS, Bluetooth),setting alarms. The application provides online as well as offline services. The application also applies machine learning concepts to identify usage patterns and create an environment which anticipates user requirements. The tasks being performed repetitively are automated. Services of activity recognition, recognizing nearby friends using Bluetooth are performed. The importance of the project is that it provides visually challenged people as well as the general population an alternate and a very easy way to control applications on android smart phones
Intrinsic sparse LSTM using structured targeted dropout for efficient hardware inference
Recurrent Neural Networks (RNNs) are useful for speech recognition but their fully-connected structure leads to a large memory footprint, making it difficult to deploy them on resource-constrained embedded systems. Previous structured RNN pruning methods can effectively reduce RNN size; however, it is difficult to find a good balance between high sparsity and high task accuracy or the pruned models only lead to moderate speedup on custom hardware accelerators. This work proposes a novel structured pruning method called Structure Targeted Dropout (STD)-Intrinsic Sparse Structures (ISS) that stochastically drops grouped rows and columns of the weight matrices during training. The compressed networks are equivalent to a smaller dense network, which can be efficiently processed by Graphics Processing Units (GPUs). STD-ISS is evaluated on the TIMIT phone recognition task using Long Short-Term Memory (LSTM) RNNs. It outperforms previous state-of-the-art hardware-friendly methods on both accuracy and compression ratio. STD-ISS achieves a size compression ratio of up to 50× with <1% accuracy loss, leading to a 19.1× speedup on the embedded Jetson Xavier NX GPU platform