184 research outputs found
Towards thinner convolutional neural networks through Gradually Global Pruning
Deep network pruning is an effective method to reduce the storage and
computation cost of deep neural networks when applying them to resource-limited
devices. Among many pruning granularities, neuron level pruning will remove
redundant neurons and filters in the model and result in thinner networks. In
this paper, we propose a gradually global pruning scheme for neuron level
pruning. In each pruning step, a small percent of neurons were selected and
dropped across all layers in the model. We also propose a simple method to
eliminate the biases in evaluating the importance of neurons to make the scheme
feasible. Compared with layer-wise pruning scheme, our scheme avoid the
difficulty in determining the redundancy in each layer and is more effective
for deep networks. Our scheme would automatically find a thinner sub-network in
original network under a given performance
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Progressive Learning of Low-Precision Networks
Recent years have witnessed the great advance of deep learning in a variety
of vision tasks. Many state-of-the-art deep neural networks suffer from large
size and high complexity, which makes it difficult to deploy in
resource-limited platforms such as mobile devices.
To this end, low-precision neural networks are widely studied which quantize
weights or activations into the low-bit format.
Though being efficient, low-precision networks are usually hard to train and
encounter severe accuracy degradation.
In this paper, we propose a new training strategy through expanding
low-precision networks during training and removing the expanded parts for
network inference.
First, we equip each low-precision convolutional layer with an ancillary
full-precision convolutional layer based on a low-precision network structure,
which could guide the network to good local minima.
Second, a decay method is introduced to reduce the output of the added
full-precision convolution gradually, which keeps the resulted topology
structure the same to the original low-precision one.
Experiments on SVHN, CIFAR and ILSVRC-2012 datasets prove that the proposed
method can bring faster convergence and higher accuracy for low-precision
neural networks.Comment: 10 pages, 8 figure
Prune the Convolutional Neural Networks with Sparse Shrink
Nowadays, it is still difficult to adapt Convolutional Neural Network (CNN)
based models for deployment on embedded devices. The heavy computation and
large memory footprint of CNN models become the main burden in real
application. In this paper, we propose a "Sparse Shrink" algorithm to prune an
existing CNN model. By analyzing the importance of each channel via sparse
reconstruction, the algorithm is able to prune redundant feature maps
accordingly. The resulting pruned model thus directly saves computational
resource. We have evaluated our algorithm on CIFAR-100. As shown in our
experiments, we can reduce 56.77% parameters and 73.84% multiplication in total
with only minor decrease in accuracy. These results have demonstrated the
effectiveness of our "Sparse Shrink" algorithm
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance degrade. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fragility of the
expressiveness of CNNs, and thus calls for a more gentle regularization scheme
so that the networks can adapt during pruning. To achieve this, we propose a
new and novel regularization-based pruning method, named IncReg, to
incrementally assign different regularization factors to different weights
based on their relative importance. Empirical analysis on CIFAR-10 dataset
verifies the merits of IncReg. Further extensive experiments with popular CNNs
on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even
better results compared with state-of-the-arts. Our source codes and trained
models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
The redundancy is widely recognized in Convolutional Neural Networks (CNNs),
which enables to remove unimportant filters from convolutional layers so as to
slim the network with acceptable performance drop. Inspired by the linear and
combinational properties of convolution, we seek to make some filters
increasingly close and eventually identical for network slimming. To this end,
we propose Centripetal SGD (C-SGD), a novel optimization method, which can
train several filters to collapse into a single point in the parameter
hyperspace. When the training is completed, the removal of the identical
filters can trim the network with NO performance loss, thus no finetuning is
needed. By doing so, we have partly solved an open problem of constrained
filter pruning on CNNs with complicated structure, where some layers must be
pruned following others. Our experimental results on CIFAR-10 and ImageNet have
justified the effectiveness of C-SGD-based filter pruning. Moreover, we have
provided empirical evidences for the assumption that the redundancy in deep
neural networks helps the convergence of training by showing that a redundant
CNN trained using C-SGD outperforms a normally trained counterpart with the
equivalent width.Comment: CVPR 201
Training convolutional neural networks with cheap convolutions and online distillation
The large memory and computation consumption in convolutional neural networks
(CNNs) has been one of the main barriers for deploying them on resource-limited
systems. To this end, most cheap convolutions (e.g., group convolution,
depth-wise convolution, and shift convolution) have recently been used for
memory and computation reduction but with the specific architecture designing.
Furthermore, it results in a low discriminability of the compressed networks by
directly replacing the standard convolution with these cheap ones. In this
paper, we propose to use knowledge distillation to improve the performance of
the compact student networks with cheap convolutions. In our case, the teacher
is a network with the standard convolution, while the student is a simple
transformation of the teacher architecture without complicated redesigning. In
particular, we propose a novel online distillation method, which online
constructs the teacher network without pre-training and conducts mutual
learning between the teacher and student network, to improve the performance of
the student model. Extensive experiments demonstrate that the proposed approach
achieves superior performance to simultaneously reduce memory and computation
overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and
ImageNet ILSVRC 2012, compared to the state-of-the-art CNN compression and
acceleration methods. The codes are publicly available at
https://github.com/EthanZhangYC/OD-cheap-convolution
Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training
The success of DNN pruning has led to the development of energy-efficient
inference accelerators that support pruned models with sparse weight and
activation tensors. Because the memory layouts and dataflows in these
architectures are optimized for the access patterns during
, however, they do not efficiently support the emerging
sparse techniques.
In this paper, we demonstrate (a) that accelerating sparse training requires
a co-design approach where algorithms are adapted to suit the constraints of
hardware, and (b) that hardware for sparse DNN training must tackle constraints
that do not arise in inference accelerators. As proof of concept, we adapt a
sparse training algorithm to be amenable to hardware acceleration; we then
develop dataflow, data layout, and load-balancing techniques to accelerate it.
The resulting system is a sparse DNN training accelerator that produces
pruned models with the same accuracy as dense models without first training,
then pruning, and finally retraining, a dense model. Compared to training the
equivalent unpruned models using a state-of-the-art DNN accelerator without
sparse training support, Procrustes consumes up to 3.26 less energy and
offers up to 4 speedup across a range of models, while pruning weights
by an order of magnitude and maintaining unpruned accuracy.Comment: Appears in the Proceedings of the 53 IEEE/ACM
International Symposium on Microarchitecture (MICRO 2020
UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks Compression and Acceleration
To apply deep CNNs to mobile terminals and portable devices, many scholars
have recently worked on the compressing and accelerating deep convolutional
neural networks. Based on this, we propose a novel uniform channel pruning
(UCP) method to prune deep CNN, and the modified squeeze-and-excitation blocks
(MSEB) is used to measure the importance of the channels in the convolutional
layers. The unimportant channels, including convolutional kernels related to
them, are pruned directly, which greatly reduces the storage cost and the
number of calculations. There are two types of residual blocks in ResNet. For
ResNet with bottlenecks, we use the pruning method with traditional CNN to trim
the 3x3 convolutional layer in the middle of the blocks. For ResNet with basic
residual blocks, we propose an approach to consistently prune all residual
blocks in the same stage to ensure that the compact network structure is
dimensionally correct. Considering that the network loses considerable
information after pruning and that the larger the pruning amplitude is, the
more information that will be lost, we do not choose fine-tuning but retrain
from scratch to restore the accuracy of the network after pruning. Finally, we
verified our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image
classification. The results indicate that the performance of the compact
network after retraining from scratch, when the pruning rate is small, is
better than the original network. Even when the pruning amplitude is large, the
accuracy can be maintained or decreased slightly. On the CIFAR-100, when
reducing the parameters and FLOPs up to 82% and 62% respectively, the accuracy
of VGG-19 even improved by 0.54% after retraining.Comment: 21 pages,7 figures and 5 table
HSD-CNN: Hierarchically self decomposing CNN architecture using class specific filter sensitivity analysis
Conventional Convolutional neural networks (CNN) are trained on large domain
datasets and are hence typically over-represented and inefficient in limited
class applications. An efficient way to convert such large many-class
pre-trained networks into small few-class networks is through a hierarchical
decomposition of its feature maps. To alleviate this issue, we propose an
automated framework for such decomposition in Hierarchically Self Decomposing
CNN (HSD-CNN), in four steps. HSD-CNN is derived automatically using a
class-specific filter sensitivity analysis that quantifies the impact of
specific features on a class prediction. The decomposed hierarchical network
can be utilized and deployed directly to obtain sub-networks for a subset of
classes, and it is shown to perform better without the requirement of
retraining these sub-networks. Experimental results show that HSD-CNN generally
does not degrade accuracy if the full set of classes are used. Interestingly,
when operating on known subsets of classes, HSD-CNN has an improvement in
accuracy with a much smaller model size, requiring much fewer operations.
HSD-CNN flow is verified on the CIFAR10, CIFAR100 and CALTECH101 data sets. We
report accuracies up to ( ) on scenarios with 13 ( 4 )
classes of CIFAR100, using a pre-trained VGG-16 network on the full data set.
In this case, the proposed HSD-CNN requires fewer parameters and
has savings in operations, in comparison to baseline VGG-16
containing features for all 100 classes.Comment: Accepted in ICVGIP,201
- …