3 research outputs found
Precision Scaling of Neural Networks for Efficient Audio Processing
While deep neural networks have shown powerful performance in many audio
applications, their large computation and memory demand has been a challenge
for real-time processing. In this paper, we study the impact of scaling the
precision of neural networks on the performance of two common audio processing
tasks, namely, voice-activity detection and single-channel speech enhancement.
We determine the optimal pair of weight/neuron bit precision by exploring its
impact on both the performance and processing time. Through experiments
conducted with real user data, we demonstrate that deep neural networks that
use lower bit precision significantly reduce the processing time (up to 30x).
However, their performance impact is low (< 3.14%) only in the case of
classification tasks such as those present in voice activity detection
A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)
Numerous studies have investigated the effectiveness of neural network
quantization on pattern classification tasks. The present study, for the first
time, investigated the performance of speech enhancement (a regression task in
speech processing) using a novel exponent-only floating-point quantized neural
network (EOFP-QNN). The proposed EOFP-QNN consists of two stages:
mantissa-quantization and exponent-quantization. In the mantissa-quantization
stage, EOFP-QNN learns how to quantize the mantissa bits of the model
parameters while preserving the regression accuracy using the least mantissa
precision. In the exponent-quantization stage, the exponent part of the
parameters is further quantized without causing any additional performance
degradation. We evaluated the proposed EOFP quantization technique on two types
of neural networks, namely, bidirectional long short-term memory (BLSTM) and
fully convolutional neural network (FCN), on a speech enhancement task.
Experimental results showed that the model sizes can be significantly reduced
(the model sizes of the quantized BLSTM and FCN models were only 18.75% and
21.89%, respectively, compared to those of the original models) while
maintaining satisfactory speech-enhancement performance
Faster Convolution Inference Through Using Pre-Calculated Lookup Tables
Low-cardinality activations permit an algorithm based on fetching the
inference values from pre-calculated lookup tables instead of calculating them
every time. This algorithm can have extensions, some of which offer abilities
beyond those of the currently used algorithms. It also allows for a simpler and
more effective CNN-specialized hardware.Comment: 11 pages, 7 figure