30 research outputs found
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
Mobile devices are becoming an important carrier for deep learning tasks, as
they are being equipped with powerful, high-end mobile CPUs and GPUs. However,
it is still a challenging task to execute 3D Convolutional Neural Networks
(CNNs) targeting for real-time performance, besides high inference accuracy.
The reason is more complex model structure and higher model dimensionality
overwhelm the available computation/storage resources on mobile devices. A
natural way may be turning to deep learning weight pruning techniques. However,
the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs
is not ideal for fully exploiting mobile parallelism while achieving high
inference accuracy.
This paper proposes RT3D, a model compression and mobile acceleration
framework for 3D CNNs, seamlessly integrating neural network weight pruning and
compiler code generation techniques. We propose and investigate two structured
sparsity schemes i.e., the vanilla structured sparsity and kernel group
structured (KGS) sparsity that are mobile acceleration friendly. The vanilla
sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained
structured sparsity that enjoys higher flexibility while exploiting full
on-device parallelism. We propose a reweighted regularization pruning algorithm
to achieve the proposed sparsity schemes. The inference time speedup due to
sparsity is approaching the pruning rate of the whole model FLOPs (floating
point operations). RT3D demonstrates up to 29.1 speedup in end-to-end
inference time comparing with current mobile frameworks supporting 3D CNNs,
with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video
frames could be within 150 ms, when executing representative C3D and R(2+1)D
models on a cellphone. For the first time, real-time execution of 3D CNNs is
achieved on off-the-shelf mobiles.Comment: To appear in Proceedings of the 35th AAAI Conference on Artificial
Intelligence (AAAI-21
Gaining the Sparse Rewards by Exploring Binary Lottery Tickets in Spiking Neural Network
Spiking Neural Network (SNN) as a brain-inspired strategy receives lots of
attention because of the high-sparsity and low-power properties derived from
its inherent spiking information state. To further improve the efficiency of
SNN, some works declare that the Lottery Tickets (LTs) Hypothesis, which
indicates that the Artificial Neural Network (ANN) contains a subnetwork
without sacrificing the performance of the original network, also exists in
SNN. However, the spiking information handled by SNN has a natural similarity
and affinity with binarization in sparsification. Therefore, to further explore
SNN efficiency, this paper focuses on (1) the presence or absence of LTs in the
binary SNN, and (2) whether the spiking mechanism is a superior strategy in
terms of handling binary information compared to simple model binarization. To
certify these consumptions, a sparse training method is proposed to find Binary
Weights Spiking Lottery Tickets (BinW-SLT) under different network structures.
Through comprehensive evaluations, we show that BinW-SLT could attain up to
+5.86% and +3.17% improvement on CIFAR-10 and CIFAR-100 compared with binary
LTs, as well as achieve 1.86x and 8.92x energy saving compared with
full-precision SNN and ANN.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl