50 research outputs found
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
The redundancy is widely recognized in Convolutional Neural Networks (CNNs),
which enables to remove unimportant filters from convolutional layers so as to
slim the network with acceptable performance drop. Inspired by the linear and
combinational properties of convolution, we seek to make some filters
increasingly close and eventually identical for network slimming. To this end,
we propose Centripetal SGD (C-SGD), a novel optimization method, which can
train several filters to collapse into a single point in the parameter
hyperspace. When the training is completed, the removal of the identical
filters can trim the network with NO performance loss, thus no finetuning is
needed. By doing so, we have partly solved an open problem of constrained
filter pruning on CNNs with complicated structure, where some layers must be
pruned following others. Our experimental results on CIFAR-10 and ImageNet have
justified the effectiveness of C-SGD-based filter pruning. Moreover, we have
provided empirical evidences for the assumption that the redundancy in deep
neural networks helps the convergence of training by showing that a redundant
CNN trained using C-SGD outperforms a normally trained counterpart with the
equivalent width.Comment: CVPR 201
Channel Pruning via Optimal Thresholding
Structured pruning, especially channel pruning is widely used for the reduced
computational cost and the compatibility with off-the-shelf hardware devices.
Among existing works, weights are typically removed using a predefined global
threshold, or a threshold computed from a predefined metric. The predefined
global threshold based designs ignore the variation among different layers and
weights distribution, therefore, they may often result in sub-optimal
performance caused by over-pruning or under-pruning. In this paper, we present
a simple yet effective method, termed Optimal Thresholding (OT), to prune
channels with layer dependent thresholds that optimally separate important from
negligible channels. By using OT, most negligible or unimportant channels are
pruned to achieve high sparsity while minimizing performance degradation. Since
most important weights are preserved, the pruned model can be further
fine-tuned and quickly converge with very few iterations. Our method
demonstrates superior performance, especially when compared to the
state-of-the-art designs at high levels of sparsity. On CIFAR-100, a pruned and
fine-tuned DenseNet-121 by using OT achieves 75.99% accuracy with only 1.46e8
FLOPs and 0.71M parameters.Comment: ICONIP 202
Joint Multi-Dimension Pruning
We present joint multi-dimension pruning (named as JointPruning), a new
perspective of pruning a network on three crucial aspects: spatial, depth and
channel simultaneously. The joint strategy enables to search a better status
than previous studies that focused on individual dimension solely, as our
method is optimized collaboratively across the three dimensions in a single
end-to-end training. Moreover, each dimension that we consider can promote to
get better performance through colluding with the other two. Our method is
realized by the adapted stochastic gradient estimation. Extensive experiments
on large-scale ImageNet dataset across a variety of network architectures
MobileNet V1&V2 and ResNet demonstrate the effectiveness of our proposed
method. For instance, we achieve significant margins of 2.5% and 2.6%
improvement over the state-of-the-art approach on the already compact MobileNet
V1&V2 under an extremely large compression ratio
Towards Efficient Model Compression via Learned Global Ranking
Pruning convolutional filters has demonstrated its effectiveness in
compressing ConvNets. Prior art in filter pruning requires users to specify a
target model complexity (e.g., model size or FLOP count) for the resulting
architecture. However, determining a target model complexity can be difficult
for optimizing various embodied AI applications such as autonomous robots,
drones, and user-facing applications. First, both the accuracy and the speed of
ConvNets can affect the performance of the application. Second, the performance
of the application can be hard to assess without evaluating ConvNets during
inference. As a consequence, finding a sweet-spot between the accuracy and
speed via filter pruning, which needs to be done in a trial-and-error fashion,
can be time-consuming. This work takes a first step toward making this process
more efficient by altering the goal of model compression to producing a set of
ConvNets with various accuracy and latency trade-offs instead of producing one
ConvNet targeting some pre-defined latency constraint. To this end, we propose
to learn a global ranking of the filters across different layers of the
ConvNet, which is used to obtain a set of ConvNet architectures that have
different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our
proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while
having comparable or better performance when targeting seven pruned ResNet-56
with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally,
we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2
to demonstrate its effectiveness. Code available at
https://github.com/cmu-enyac/LeGR.Comment: CVPR 2020 Ora
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification
Modern deep neural networks (DNNs) often require high memory consumption and
large computational loads. In order to deploy DNN algorithms efficiently on
edge or mobile devices, a series of DNN compression algorithms have been
explored, including factorization methods. Factorization methods approximate
the weight matrix of a DNN layer with the multiplication of two or multiple
low-rank matrices. However, it is hard to measure the ranks of DNN layers
during the training process. Previous works mainly induce low-rank through
implicit approximations or via costly singular value decomposition (SVD)
process on every training step. The former approach usually induces a high
accuracy loss while the latter has a low efficiency. In this work, we propose
SVD training, the first method to explicitly achieve low-rank DNNs during
training without applying SVD on every step. SVD training first decomposes each
layer into the form of its full-rank SVD, then performs training directly on
the decomposed weights. We add orthogonality regularization to the singular
vectors, which ensure the valid form of SVD and avoid gradient
vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing
regularizers on the singular values of each layer. Singular value pruning is
applied at the end to explicitly reach a low-rank model. We empirically show
that SVD training can significantly reduce the rank of DNN layers and achieve
higher reduction on computation load under the same accuracy, comparing to not
only previous factorization methods but also state-of-the-art filter pruning
methods.Comment: In proceeding of 2020 IEEE/CVF Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW). To be presented at EDLCV 2020 workshop
co-located with CVPR 202
Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
Deep Neural Network (DNN) is powerful but computationally expensive and
memory intensive, thus impeding its practical usage on resource-constrained
front-end devices. DNN pruning is an approach for deep model compression, which
aims at eliminating some parameters with tolerable performance degradation. In
this paper, we propose a novel momentum-SGD-based optimization method to reduce
the network complexity by on-the-fly pruning. Concretely, given a global
compression ratio, we categorize all the parameters into two parts at each
training iteration which are updated using different rules. In this way, we
gradually zero out the redundant parameters, as we update them using only the
ordinary weight decay but no gradients derived from the objective function. As
a departure from prior methods that require heavy human works to tune the
layer-wise sparsity ratios, prune by solving complicated non-differentiable
problems or finetune the model after pruning, our method is characterized by 1)
global compression that automatically finds the appropriate per-layer sparsity
ratios; 2) end-to-end training; 3) no need for a time-consuming re-training
process after pruning; and 4) superior capability to find better winning
tickets which have won the initialization lottery.Comment: Accepted by NeurIPS 201
Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter
pruning), which aims to slim down a convolutional neural network (CNN) by
reducing the width (number of output channels) of convolutional layers.
Inspired by the neurobiology research about the independence of remembering and
forgetting, we propose to re-parameterize a CNN into the remembering parts and
forgetting parts, where the former learn to maintain the performance and the
latter learn for efficiency. By training the re-parameterized model using
regular SGD on the former but a novel update rule with penalty gradients on the
latter, we realize structured sparsity, enabling us to equivalently convert the
re-parameterized model into the original architecture with narrower layers.
Such a methodology distinguishes ResRep from the traditional learning-based
pruning paradigm that applies a penalty on parameters to produce structured
sparsity, which may suppress the parameters essential for the remembering. Our
method slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a
narrower one with only 45% FLOPs and no accuracy drop, which is the first to
achieve lossless pruning with such a high compression ratio, to the best of our
knowledge.Comment: 11 pages, 9 figure
UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks Compression and Acceleration
To apply deep CNNs to mobile terminals and portable devices, many scholars
have recently worked on the compressing and accelerating deep convolutional
neural networks. Based on this, we propose a novel uniform channel pruning
(UCP) method to prune deep CNN, and the modified squeeze-and-excitation blocks
(MSEB) is used to measure the importance of the channels in the convolutional
layers. The unimportant channels, including convolutional kernels related to
them, are pruned directly, which greatly reduces the storage cost and the
number of calculations. There are two types of residual blocks in ResNet. For
ResNet with bottlenecks, we use the pruning method with traditional CNN to trim
the 3x3 convolutional layer in the middle of the blocks. For ResNet with basic
residual blocks, we propose an approach to consistently prune all residual
blocks in the same stage to ensure that the compact network structure is
dimensionally correct. Considering that the network loses considerable
information after pruning and that the larger the pruning amplitude is, the
more information that will be lost, we do not choose fine-tuning but retrain
from scratch to restore the accuracy of the network after pruning. Finally, we
verified our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image
classification. The results indicate that the performance of the compact
network after retraining from scratch, when the pruning rate is small, is
better than the original network. Even when the pruning amplitude is large, the
accuracy can be maintained or decreased slightly. On the CIFAR-100, when
reducing the parameters and FLOPs up to 82% and 62% respectively, the accuracy
of VGG-19 even improved by 0.54% after retraining.Comment: 21 pages,7 figures and 5 table
SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration
Accelerating the inference speed of CNNs is critical to their deployment in
real-world applications. Among all the pruning approaches, those implementing a
sparsity learning framework have shown to be effective as they learn and prune
the models in an end-to-end data-driven manner. However, these works impose the
same sparsity regularization on all filters indiscriminately, which can hardly
result in an optimal structure-sparse network. In this paper, we propose a
Saliency-Adaptive Sparsity Learning (SASL) approach for further optimization. A
novel and effective estimation of each filter, i.e., saliency, is designed,
which is measured from two aspects: the importance for the prediction
performance and the consumed computational resources. During sparsity learning,
the regularization strength is adjusted according to the saliency, so our
optimized format can better preserve the prediction performance while zeroing
out more computation-heavy filters. The calculation for saliency introduces
minimum overhead to the training process, which means our SASL is very
efficient. During the pruning phase, in order to optimize the proposed
data-dependent criterion, a hard sample mining strategy is utilized, which
shows higher effectiveness and efficiency. Extensive experiments demonstrate
the superior performance of our method. Notably, on ILSVRC-2012 dataset, our
approach can reduce 49.7% FLOPs of ResNet-50 with very negligible 0.39% top-1
and 0.05% top-5 accuracy degradation.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technolog
GhostSR: Learning Ghost Features for Efficient Image Super-Resolution
Modern single image super-resolution (SISR) system based on convolutional
neural networks (CNNs) achieves fancy performance while requires huge
computational costs. The problem on feature redundancy is well studied in
visual recognition task, but rarely discussed in SISR. Based on the observation
that many features in SISR models are also similar to each other, we propose to
use shift operation to generate the redundant features (i.e., Ghost features).
Compared with depth-wise convolution which is not friendly to GPUs or NPUs,
shift operation can bring practical inference acceleration for CNNs on common
hardware. We analyze the benefits of shift operation for SISR and make the
shift orientation learnable based on Gumbel-Softmax trick. For a given
pre-trained model, we first cluster all filters in each convolutional layer to
identify the intrinsic ones for generating intrinsic features. Ghost features
will be derived by moving these intrinsic features along a specific
orientation. The complete output features are constructed by concatenating the
intrinsic and ghost features together. Extensive experiments on several
benchmark models and datasets demonstrate that both the non-compact and
lightweight SISR models embedded in our proposed module can achieve comparable
performance to that of their baselines with large reduction of parameters,
FLOPs and GPU latency. For instance, we reduce the parameters by 47%, FLOPs by
46% and GPU latency by 41% of EDSR x2 network without significant performance
degradation