2 research outputs found
Efficient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning
The recent trend toward increasingly deep convolutional neural networks
(CNNs) leads to a higher demand of computational power and memory storage.
Consequently, the deployment of CNNs in hardware has become more challenging.
In this paper, we propose an Intra-Kernel Regular (IKR) pruning scheme to
reduce the size and computational complexity of the CNNs by removing redundant
weights at a fine-grained level. Unlike other pruning methods such as
Fine-Grained pruning, IKR pruning maintains regular kernel structures that are
exploitable in a hardware accelerator. Experimental results demonstrate up to
10x parameter reduction and 7x computational reduction at a cost of less than
1% degradation in accuracy versus the un-pruned case.Comment: 6 pages, 8 figures, ISMVL 201
PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators
Weight pruning is a powerful technique to realize model compression. We
propose PCNN, a fine-grained regular 1D pruning method. A novel index format
called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN.
Leveraging SPM with limited pruning patterns and non-zero sequences with equal
length, PCNN can be efficiently employed in hardware. Evaluated on VGG-16 and
ResNet-18, our PCNN achieves the compression rate up to 8.4X with only 0.2%
accuracy loss. We also implement a pattern-aware architecture in 55nm process,
achieving up to 9.0X speedup and 28.39 TOPS/W efficiency with only 3.1% on-chip
memory overhead of indices.Comment: 6 pages, DAC 2020 accepted pape