Scalable accelerator for nonuniform multi-word log-quantized neural network

Abstract

Department of Electrical EngineeringLogarithmic quantization has many hardware-friendly features, but its lower accuracy in certain conditions has prevented more widespread use. Recently modified schemes have been proposed to solve the accuracy problem without compromising its hardware efficiency by selectively employing multiple words. This however causes variable-latency multiplication, demanding a new hardware architecture to support efficient mapping of large neural network layers as well as various types of convolution layers such as depthwise separable convolution. In this paper we present a novel hardware architecture for nonuniform multi-word log-quantized neural networks that is scalable with the number of processing elements while maximizing data reuse. Our architecture supports depthwise convolution and pointwise convolution as well as 3D convolution, which are important for recent mobile-friendly networks. We also propose a hardware-software cooperative optimization to reduce the impact of variable-latency multiplication on performance. Our experimental results using various convolution layers from MobileNetV2 demonstrate the speed advantage of our architecture and high scalability with the number of PEs, compared with previous architectures for depthwise separable convolution or log quantization. Our results also show that our optimization is very effective in improving the performance of our architecture.clos

    Similar works