Search CORE

2 research outputs found

Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints

Author: Choi Yoojin
El-Khamy Mostafa
Lee Jungwon
Publication venue
Publication date: 28/10/2018
Field of study

We consider the optimization of deep convolutional neural networks (CNNs) such that they provide good performance while having reduced complexity if deployed on either conventional systems utilizing spatial-domain convolution or lower complexity systems designed for Winograd convolution. Furthermore, we explore the universal quantization and compression of these networks. In particular, the proposed framework produces one compressed model whose convolutional filters can be made sparse either in the spatial domain or in the Winograd domain. Hence, one compressed model can be deployed universally on any platform, without need for re-training on the deployed platform, and the sparsity of its convolutional filters can be exploited for further complexity reduction in either domain. To get a better compression ratio, the sparse model is compressed in the spatial domain which has a less number of parameters. From our experiments, we obtain

24.2\times

47.7\times

and

35.4\times

compressed models for ResNet-18, AlexNet and CT-SRCNN, while their computational cost is also reduced by

4.5\times

5.1\times

and

23.5\times

, respectively

arXiv.org e-Print Archive

Sparse Winograd Convolutional neural networks on small-scale systolic arrays

Author: Gao Yuhe
Kuschner Benjamin
Li Haochen
Shi Feng
Zhu Song-Chun
Publication venue
Publication date: 03/10/2018
Field of study

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput of computation power and the ability of the memory subsystem to support it. In this paper, we implement an accelerator on FPGA by combining the sparse Winograd convolution, clusters of small-scale systolic arrays, and a tailored memory layout design. We also provide an analytical model analysis for the general Winograd convolution algorithm as a design reference. Experimental results on VGG16 show that it achieves very high computational resource utilization, 20x ~ 30x energy efficiency, and more than 5x speedup compared with the dense implementation.Comment: submitted to FPGA 201

arXiv.org e-Print Archive