177 research outputs found
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance degrade. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fragility of the
expressiveness of CNNs, and thus calls for a more gentle regularization scheme
so that the networks can adapt during pruning. To achieve this, we propose a
new and novel regularization-based pruning method, named IncReg, to
incrementally assign different regularization factors to different weights
based on their relative importance. Empirical analysis on CIFAR-10 dataset
verifies the merits of IncReg. Further extensive experiments with popular CNNs
on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even
better results compared with state-of-the-arts. Our source codes and trained
models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance loss. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fact that the
expressiveness of CNNs is fragile and needs a more gentle way of regularization
for the networks to adapt during pruning. To solve this problem, we propose a
new regularization-based pruning method (named IncReg) to incrementally assign
different regularization factors to different weight groups based on their
relative importance, whose effectiveness is proved on popular CNNs compared
with state-of-the-art methods.Comment: Accepted by NIPS 2018 workshop on "Compact Deep Neural Network
Representation with Industrial Applications
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
The redundancy is widely recognized in Convolutional Neural Networks (CNNs),
which enables to remove unimportant filters from convolutional layers so as to
slim the network with acceptable performance drop. Inspired by the linear and
combinational properties of convolution, we seek to make some filters
increasingly close and eventually identical for network slimming. To this end,
we propose Centripetal SGD (C-SGD), a novel optimization method, which can
train several filters to collapse into a single point in the parameter
hyperspace. When the training is completed, the removal of the identical
filters can trim the network with NO performance loss, thus no finetuning is
needed. By doing so, we have partly solved an open problem of constrained
filter pruning on CNNs with complicated structure, where some layers must be
pruned following others. Our experimental results on CIFAR-10 and ImageNet have
justified the effectiveness of C-SGD-based filter pruning. Moreover, we have
provided empirical evidences for the assumption that the redundancy in deep
neural networks helps the convergence of training by showing that a redundant
CNN trained using C-SGD outperforms a normally trained counterpart with the
equivalent width.Comment: CVPR 201
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration
Accelerating the inference speed of CNNs is critical to their deployment in
real-world applications. Among all the pruning approaches, those implementing a
sparsity learning framework have shown to be effective as they learn and prune
the models in an end-to-end data-driven manner. However, these works impose the
same sparsity regularization on all filters indiscriminately, which can hardly
result in an optimal structure-sparse network. In this paper, we propose a
Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization. A
novel and effective estimation of each filter, i.e., saliency, is designed,
which is measured from two aspects: the importance for the prediction
performance and the consumed computational resources. During sparsity learning,
the regularization strength is adjusted according to the saliency, so our
optimized format can better preserve the prediction performance while zeroing
out more computation-heavy filters. The calculation for saliency introduces
minimum overhead to the training process, which means our SASL is very
efficient. During the pruning phase, in order to optimize the proposed
data-dependent criterion, a hard sample mining strategy is utilized, which
shows higher effectiveness and efficiency. Extensive experiments demonstrate
the superior performance of our method. Notably, on ILSVRC-2012 dataset, our
approach can reduce 49.7% FLOPs of ResNet-50 with very negligible 0.39% top-1
and 0.05% top-5 accuracy degradation.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technolog
C2S2: Cost-aware Channel Sparse Selection for Progressive Network Pruning
This paper describes a channel-selection approach for simplifying deep neural
networks. Specifically, we propose a new type of generic network layer, called
pruning layer, to seamlessly augment a given pre-trained model for compression.
Each pruning layer, comprising depth-wise kernels, is represented
with a dual format: one is real-valued and the other is binary. The former
enables a two-phase optimization process of network pruning to operate with an
end-to-end differentiable network, and the latter yields the mask information
for channel selection. Our method progressively performs the pruning task
layer-wise, and achieves channel selection according to a sparsity criterion to
favor pruning more channels. We also develop a cost-aware mechanism to prevent
the compression from sacrificing the expected network performance. Our results
for compressing several benchmark deep networks on image classification and
semantic segmentation are comparable to those by state-of-the-art
Pruning Filter in Filter
Pruning has become a very powerful and effective technique to compress and
accelerate modern neural networks. Existing pruning methods can be grouped into
two categories: filter pruning (FP) and weight pruning (WP). FP wins at
hardware compatibility but loses at the compression ratio compared with WP. To
converge the strength of both methods, we propose to prune the filter in the
filter. Specifically, we treat a filter
as stripes, i.e., filters , then
by pruning the stripes instead of the whole filter, we can achieve finer
granularity than traditional FP while being hardware friendly. We term our
method as SWP (\emph{Stripe-Wise Pruning}). SWP is implemented by introducing a
novel learnable matrix called Filter Skeleton, whose values reflect the shape
of each filter. As some recent work has shown that the pruned architecture is
more crucial than the inherited important weights, we argue that the
architecture of a single filter, i.e., the shape, also matters. Through
extensive experiments, we demonstrate that SWP is more effective compared to
the previous FP-based methods and achieves the state-of-art pruning ratio on
CIFAR-10 and ImageNet datasets without obvious accuracy drop. Code is available
at https://github.com/fxmeng/Pruning-Filter-in-FilterComment: Accepted by NeurIPS202
Neural Pruning via Growing Regularization
Regularization has long been utilized to learn sparsity in deep neural
network pruning. However, its role is mainly explored in the small penalty
strength regime. In this work, we extend its application to a new scenario
where the regularization grows large gradually to tackle two central problems
of pruning: pruning schedule and weight importance scoring. (1) The former
topic is newly brought up in this work, which we find critical to the pruning
performance while receives little research attention. Specifically, we propose
an L2 regularization variant with rising penalty factors and show it can bring
significant accuracy gains compared with its one-shot counterpart, even when
the same weights are removed. (2) The growing penalty scheme also brings us an
approach to exploit the Hessian information for more accurate pruning without
knowing their specific values, thus not bothered by the common Hessian
approximation problems. Empirically, the proposed algorithms are easy to
implement and scalable to large datasets and networks in both structured and
unstructured pruning. Their effectiveness is demonstrated with modern deep
neural networks on the CIFAR and ImageNet datasets, achieving competitive
results compared to many state-of-the-art algorithms. Our code and trained
models are publicly available at
https://github.com/mingsuntse/regularization-pruning.Comment: Accepted by ICLR 202
Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness
Deep Neural Network (DNN) trained by the gradient descent method is known to
be vulnerable to maliciously perturbed adversarial input, aka. adversarial
attack. As one of the countermeasures against adversarial attack, increasing
the model capacity for DNN robustness enhancement was discussed and reported as
an effective approach by many recent works. In this work, we show that
shrinking the model size through proper weight pruning can even be helpful to
improve the DNN robustness under adversarial attack. For obtaining a
simultaneously robust and compact DNN model, we propose a multi-objective
training method called Robust Sparse Regularization (RSR), through the fusion
of various regularization techniques, including channel-wise noise injection,
lasso weight penalty, and adversarial training. We conduct extensive
experiments across popular ResNet-20, ResNet-18 and VGG-16 DNN architectures to
demonstrate the effectiveness of RSR against popular white-box (i.e., PGD and
FGSM) and black-box attacks. Thanks to RSR, 85% weight connections of ResNet-18
can be pruned while still achieving 0.68% and 8.72% improvement in clean- and
perturbed-data accuracy respectively on CIFAR-10 dataset, in comparison to its
PGD adversarial training baseline
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Deep Neural Network (DNN) is powerful but computationally expensive and
memory intensive, thus impeding its practical usage on resource-constrained
front-end devices. DNN pruning is an approach for deep model compression, which
aims at eliminating some parameters with tolerable performance degradation. In
this paper, we propose a novel momentum-SGD-based optimization method to reduce
the network complexity by on-the-fly pruning. Concretely, given a global
compression ratio, we categorize all the parameters into two parts at each
training iteration which are updated using different rules. In this way, we
gradually zero out the redundant parameters, as we update them using only the
ordinary weight decay but no gradients derived from the objective function. As
a departure from prior methods that require heavy human works to tune the
layer-wise sparsity ratios, prune by solving complicated non-differentiable
problems or finetune the model after pruning, our method is characterized by 1)
global compression that automatically finds the appropriate per-layer sparsity
ratios; 2) end-to-end training; 3) no need for a time-consuming re-training
process after pruning; and 4) superior capability to find better winning
tickets which have won the initialization lottery.Comment: Accepted by NeurIPS 201
- …