1,414 research outputs found
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Joint Multi-Dimension Pruning
We present joint multi-dimension pruning (named as JointPruning), a new
perspective of pruning a network on three crucial aspects: spatial, depth and
channel simultaneously. The joint strategy enables to search a better status
than previous studies that focused on individual dimension solely, as our
method is optimized collaboratively across the three dimensions in a single
end-to-end training. Moreover, each dimension that we consider can promote to
get better performance through colluding with the other two. Our method is
realized by the adapted stochastic gradient estimation. Extensive experiments
on large-scale ImageNet dataset across a variety of network architectures
MobileNet V1&V2 and ResNet demonstrate the effectiveness of our proposed
method. For instance, we achieve significant margins of 2.5% and 2.6%
improvement over the state-of-the-art approach on the already compact MobileNet
V1&V2 under an extremely large compression ratio
Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition
Like other problems in computer vision, offline handwritten Chinese character
recognition (HCCR) has achieved impressive results using convolutional neural
network (CNN)-based methods. However, larger and deeper networks are needed to
deliver state-of-the-art results in this domain. Such networks intuitively
appear to incur high computational cost, and require the storage of a large
number of parameters, which renders them unfeasible for deployment in portable
devices. To solve this problem, we propose a Global Supervised Low-rank
Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve
the problems of speed and storage capacity. We design a nine-layer CNN for HCCR
consisting of 3,755 classes, and devise an algorithm that can reduce the
networks computational cost by nine times and compress the network to 1/18 of
the original size of the baseline model, with only a 0.21% drop in accuracy. In
tests, the proposed algorithm surpassed the best single-network performance
reported thus far in the literature while requiring only 2.3 MB for storage.
Furthermore, when integrated with our effective forward implementation, the
recognition of an offline character image took only 9.7 ms on a CPU. Compared
with the state-of-the-art CNN model for HCCR, our approach is approximately 30
times faster, yet 10 times more cost efficient.Comment: 15 pages, 7 figures, 5 table
Towards Efficient Model Compression via Learned Global Ranking
Pruning convolutional filters has demonstrated its effectiveness in
compressing ConvNets. Prior art in filter pruning requires users to specify a
target model complexity (e.g., model size or FLOP count) for the resulting
architecture. However, determining a target model complexity can be difficult
for optimizing various embodied AI applications such as autonomous robots,
drones, and user-facing applications. First, both the accuracy and the speed of
ConvNets can affect the performance of the application. Second, the performance
of the application can be hard to assess without evaluating ConvNets during
inference. As a consequence, finding a sweet-spot between the accuracy and
speed via filter pruning, which needs to be done in a trial-and-error fashion,
can be time-consuming. This work takes a first step toward making this process
more efficient by altering the goal of model compression to producing a set of
ConvNets with various accuracy and latency trade-offs instead of producing one
ConvNet targeting some pre-defined latency constraint. To this end, we propose
to learn a global ranking of the filters across different layers of the
ConvNet, which is used to obtain a set of ConvNet architectures that have
different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our
proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while
having comparable or better performance when targeting seven pruned ResNet-56
with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally,
we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2
to demonstrate its effectiveness. Code available at
https://github.com/cmu-enyac/LeGR.Comment: CVPR 2020 Ora
OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
Channel pruning can significantly accelerate and compress deep neural
networks. Many channel pruning works utilize structured sparsity regularization
to zero out all the weights in some channels and automatically obtain
structure-sparse network in training stage. However, these methods apply
structured sparsity regularization on each layer separately where the
correlations between consecutive layers are omitted. In this paper, we first
combine one out-channel in current layer and the corresponding in-channel in
next layer as a regularization group, namely out-in-channel. Our proposed
Out-In-Channel Sparsity Regularization (OICSR) considers correlations between
successive layers to further retain predictive power of the compact network.
Training with OICSR thoroughly transfers discriminative features into a
fraction of out-in-channels. Correspondingly, OICSR measures channel importance
based on statistics computed from two consecutive layers, not individual layer.
Finally, a global greedy pruning algorithm is designed to remove redundant
out-in-channels in an iterative way. Our method is comprehensively evaluated
with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet
and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on
ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the
original model by 0.22% top-1 accuracy.Comment: Accepted to CVPR 2019, the pruned ResNet-50 model has be released at:
https://github.com/dsfour/OICS
Learning to Prune Filters in Convolutional Neural Networks
Many state-of-the-art computer vision algorithms use large scale
convolutional neural networks (CNNs) as basic building blocks. These CNNs are
known for their huge number of parameters, high redundancy in weights, and
tremendous computing resource consumptions. This paper presents a learning
algorithm to simplify and speed up these CNNs. Specifically, we introduce a
"try-and-learn" algorithm to train pruning agents that remove unnecessary CNN
filters in a data-driven way. With the help of a novel reward function, our
agents removes a significant number of filters in CNNs while maintaining
performance at a desired level. Moreover, this method provides an easy control
of the tradeoff between network performance and its scale. Per- formance of our
algorithm is validated with comprehensive pruning experiments on several
popular CNNs for visual recognition and semantic segmentation tasks
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
The redundancy is widely recognized in Convolutional Neural Networks (CNNs),
which enables to remove unimportant filters from convolutional layers so as to
slim the network with acceptable performance drop. Inspired by the linear and
combinational properties of convolution, we seek to make some filters
increasingly close and eventually identical for network slimming. To this end,
we propose Centripetal SGD (C-SGD), a novel optimization method, which can
train several filters to collapse into a single point in the parameter
hyperspace. When the training is completed, the removal of the identical
filters can trim the network with NO performance loss, thus no finetuning is
needed. By doing so, we have partly solved an open problem of constrained
filter pruning on CNNs with complicated structure, where some layers must be
pruned following others. Our experimental results on CIFAR-10 and ImageNet have
justified the effectiveness of C-SGD-based filter pruning. Moreover, we have
provided empirical evidences for the assumption that the redundancy in deep
neural networks helps the convergence of training by showing that a redundant
CNN trained using C-SGD outperforms a normally trained counterpart with the
equivalent width.Comment: CVPR 201
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
Structured pruning of filters or neurons has received increased focus for
compressing convolutional neural networks. Most existing methods rely on
multi-stage optimizations in a layer-wise manner for iteratively pruning and
retraining which may not be optimal and may be computation intensive. Besides,
these methods are designed for pruning a specific structure, such as filter or
block structures without jointly pruning heterogeneous structures. In this
paper, we propose an effective structured pruning approach that jointly prunes
filters as well as other structures in an end-to-end manner. To accomplish
this, we first introduce a soft mask to scale the output of these structures by
defining a new objective function with sparsity regularization to align the
output of baseline and network with this mask. We then effectively solve the
optimization problem by generative adversarial learning (GAL), which learns a
sparse soft mask in a label-free and an end-to-end manner. By forcing more
scaling factors in the soft mask to zero, the fast iterative
shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably
remove the corresponding structures. Extensive experiments demonstrate the
effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and
ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned
ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup.
This significantly outperforms state-of-the-art methods.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance degrade. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fragility of the
expressiveness of CNNs, and thus calls for a more gentle regularization scheme
so that the networks can adapt during pruning. To achieve this, we propose a
new and novel regularization-based pruning method, named IncReg, to
incrementally assign different regularization factors to different weights
based on their relative importance. Empirical analysis on CIFAR-10 dataset
verifies the merits of IncReg. Further extensive experiments with popular CNNs
on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even
better results compared with state-of-the-arts. Our source codes and trained
models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201
- …