2 research outputs found
Compression of Deep Convolutional Neural Networks under Joint Sparsity Constraints
We consider the optimization of deep convolutional neural networks (CNNs)
such that they provide good performance while having reduced complexity if
deployed on either conventional systems utilizing spatial-domain convolution or
lower complexity systems designed for Winograd convolution. Furthermore, we
explore the universal quantization and compression of these networks. In
particular, the proposed framework produces one compressed model whose
convolutional filters can be made sparse either in the spatial domain or in the
Winograd domain. Hence, one compressed model can be deployed universally on any
platform, without need for re-training on the deployed platform, and the
sparsity of its convolutional filters can be exploited for further complexity
reduction in either domain. To get a better compression ratio, the sparse model
is compressed in the spatial domain which has a less number of parameters. From
our experiments, we obtain , and
compressed models for ResNet-18, AlexNet and CT-SRCNN, while their
computational cost is also reduced by , and
, respectively
Sparse Winograd Convolutional neural networks on small-scale systolic arrays
The reconfigurability, energy-efficiency, and massive parallelism on FPGAs
make them one of the best choices for implementing efficient deep learning
accelerators. However, state-of-art implementations seldom consider the balance
between high throughput of computation power and the ability of the memory
subsystem to support it. In this paper, we implement an accelerator on FPGA by
combining the sparse Winograd convolution, clusters of small-scale systolic
arrays, and a tailored memory layout design. We also provide an analytical
model analysis for the general Winograd convolution algorithm as a design
reference. Experimental results on VGG16 show that it achieves very high
computational resource utilization, 20x ~ 30x energy efficiency, and more than
5x speedup compared with the dense implementation.Comment: submitted to FPGA 201