Deep neural networks (DNN) have achieved impressive success in multiple
domains. Over the years, the accuracy of these models has increased with the
proliferation of deeper and more complex architectures. Thus, state-of-the-art
solutions are often computationally expensive, which makes them unfit to be
deployed on edge computing platforms. In order to mitigate the high
computation, memory, and power requirements of inferring convolutional neural
networks (CNNs), we propose the use of power-of-two quantization, which
quantizes continuous parameters into low-bit power-of-two values. This reduces
computational complexity by removing expensive multiplication operations and
with the use of low-bit weights. ResNet is adopted as the building block of our
solution and the proposed model is evaluated on a spoken language understanding
(SLU) task. Experimental results show improved performance for shift neural
network architectures, with our low-bit quantization achieving 98.76 \% on the
test set which is comparable performance to its full-precision counterpart and
state-of-the-art solutions.Comment: Accepted at INTERSPEECH 202