3 research outputs found
Learning Instance-wise Sparsity for Accelerating Deep Models
Exploring deep convolutional neural networks of high efficiency and low
memory usage is very essential for a wide variety of machine learning tasks.
Most of existing approaches used to accelerate deep models by manipulating
parameters or filters without data, e.g., pruning and decomposition. In
contrast, we study this problem from a different perspective by respecting the
difference between data. An instance-wise feature pruning is developed by
identifying informative features for different instances. Specifically, by
investigating a feature decay regularization, we expect intermediate feature
maps of each instance in deep neural networks to be sparse while preserving the
overall network performance. During online inference, subtle features of input
images extracted by intermediate layers of a well-trained neural network can be
eliminated to accelerate the subsequent calculations. We further take
coefficient of variation as a measure to select the layers that are appropriate
for acceleration. Extensive experiments conducted on benchmark datasets and
networks demonstrate the effectiveness of the proposed method.Comment: Accepted by IJCAI 201
Kernel Based Progressive Distillation for Adder Neural Networks
Adder Neural Networks (ANNs) which only contain additions bring us a new way
of developing deep neural networks with low energy consumption. Unfortunately,
there is an accuracy drop when replacing all convolution filters by adder
filters. The main reason here is the optimization difficulty of ANNs using
-norm, in which the estimation of gradient in back propagation is
inaccurate. In this paper, we present a novel method for further improving the
performance of ANNs without increasing the trainable parameters via a
progressive kernel based knowledge distillation (PKKD) method. A convolutional
neural network (CNN) with the same architecture is simultaneously initialized
and trained as a teacher network, features and weights of ANN and CNN will be
transformed to a new space to eliminate the accuracy drop. The similarity is
conducted in a higher-dimensional space to disentangle the difference of their
distributions using a kernel based method. Finally, the desired ANN is learned
based on the information from both the ground-truth and teacher, progressively.
The effectiveness of the proposed method for learning ANN with higher
performance is then well-verified on several benchmarks. For instance, the
ANN-50 trained using the proposed PKKD method obtains a 76.8\% top-1 accuracy
on ImageNet dataset, which is 0.6\% higher than that of the ResNet-50.Comment: Accepted by NeurIPS 202
GhostNet: More Features from Cheap Operations
Deploying convolutional neural networks (CNNs) on embedded devices is
difficult due to the limited memory and computation resources. The redundancy
in feature maps is an important characteristic of those successful CNNs, but
has rarely been investigated in neural architecture design. This paper proposes
a novel Ghost module to generate more feature maps from cheap operations. Based
on a set of intrinsic feature maps, we apply a series of linear transformations
with cheap cost to generate many ghost feature maps that could fully reveal
information underlying intrinsic features. The proposed Ghost module can be
taken as a plug-and-play component to upgrade existing convolutional neural
networks. Ghost bottlenecks are designed to stack Ghost modules, and then the
lightweight GhostNet can be easily established. Experiments conducted on
benchmarks demonstrate that the proposed Ghost module is an impressive
alternative of convolution layers in baseline models, and our GhostNet can
achieve higher recognition performance (e.g. top-1 accuracy) than
MobileNetV3 with similar computational cost on the ImageNet ILSVRC-2012
classification dataset. Code is available at
https://github.com/huawei-noah/ghostnetComment: CVPR 2020. Code is available at
https://github.com/huawei-noah/ghostne