1,018 research outputs found
Structured Probabilistic Pruning for Convolutional Neural Network Acceleration
In this paper, we propose a novel progressive parameter pruning method for
Convolutional Neural Network acceleration, named Structured Probabilistic
Pruning (SPP), which effectively prunes weights of convolutional layers in a
probabilistic manner. Unlike existing deterministic pruning approaches, where
unimportant weights are permanently eliminated, SPP introduces a pruning
probability for each weight, and pruning is guided by sampling from the pruning
probabilities. A mechanism is designed to increase and decrease pruning
probabilities based on importance criteria in the training process. Experiments
show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of
top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet
classification. Moreover, SPP can be directly applied to accelerate
multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x
speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We
further show the effectiveness of SPP on transfer learning tasks.Comment: CNN model acceleration, 13 pages, 6 figures, accepted by Proceedings
of the British Machine Vision Conference (BMVC), 2018 ora
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance degrade. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fragility of the
expressiveness of CNNs, and thus calls for a more gentle regularization scheme
so that the networks can adapt during pruning. To achieve this, we propose a
new and novel regularization-based pruning method, named IncReg, to
incrementally assign different regularization factors to different weights
based on their relative importance. Empirical analysis on CIFAR-10 dataset
verifies the merits of IncReg. Further extensive experiments with popular CNNs
on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even
better results compared with state-of-the-arts. Our source codes and trained
models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Dropout-based regularization methods can be regarded as injecting random
noise with pre-defined magnitude to different parts of the neural network
during training. It was recently shown that Bayesian dropout procedure not only
improves generalization but also leads to extremely sparse neural architectures
by automatically setting the individual noise magnitude per weight. However,
this sparsity can hardly be used for acceleration since it is unstructured. In
the paper, we propose a new Bayesian model that takes into account the
computational structure of neural networks and provides structured sparsity,
e.g. removes neurons and/or convolutional channels in CNNs. To do this we
inject noise to the neurons outputs while keeping the weights unregularized. We
establish the probabilistic model with a proper truncated log-uniform prior
over the noise and truncated log-normal variational approximation that ensures
that the KL-term in the evidence lower bound is computed in closed-form. The
model leads to structured sparsity by removing elements with a low SNR from the
computation graph and provides significant acceleration on a number of deep
neural architectures. The model is easy to implement as it can be formulated as
a separate dropout-like layer
Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method
Despite enjoying extensive applications in video analysis, three-dimensional
convolutional neural networks (3D CNNs)are restricted by their massive
computation and storage consumption. To solve this problem, we propose a
threedimensional regularization-based neural network pruning method to assign
different regularization parameters to different weight groups based on their
importance to the network. Further we analyze the redundancy and computation
cost for each layer to determine the different pruning ratios. Experiments show
that pruning based on our method can lead to 2x theoretical speedup with only
0.41% accuracy loss for 3DResNet18 and 3.28% accuracy loss for C3D. The
proposed method performs favorably against other popular methods for model
compression and acceleration.Comment: ICIP 201
Towards Efficient Model Compression via Learned Global Ranking
Pruning convolutional filters has demonstrated its effectiveness in
compressing ConvNets. Prior art in filter pruning requires users to specify a
target model complexity (e.g., model size or FLOP count) for the resulting
architecture. However, determining a target model complexity can be difficult
for optimizing various embodied AI applications such as autonomous robots,
drones, and user-facing applications. First, both the accuracy and the speed of
ConvNets can affect the performance of the application. Second, the performance
of the application can be hard to assess without evaluating ConvNets during
inference. As a consequence, finding a sweet-spot between the accuracy and
speed via filter pruning, which needs to be done in a trial-and-error fashion,
can be time-consuming. This work takes a first step toward making this process
more efficient by altering the goal of model compression to producing a set of
ConvNets with various accuracy and latency trade-offs instead of producing one
ConvNet targeting some pre-defined latency constraint. To this end, we propose
to learn a global ranking of the filters across different layers of the
ConvNet, which is used to obtain a set of ConvNet architectures that have
different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our
proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while
having comparable or better performance when targeting seven pruned ResNet-56
with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally,
we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2
to demonstrate its effectiveness. Code available at
https://github.com/cmu-enyac/LeGR.Comment: CVPR 2020 Ora
Dynamic Neural Network Channel Execution for Efficient Training
Existing methods for reducing the computational burden of neural networks at
run-time, such as parameter pruning or dynamic computational path selection,
focus solely on improving computational efficiency during inference. On the
other hand, in this work, we propose a novel method which reduces the memory
footprint and number of computing operations required for training and
inference. Our framework efficiently integrates pruning as part of the training
procedure by exploring and tracking the relative importance of convolutional
channels. At each training step, we select only a subset of highly salient
channels to execute according to the combinatorial upper confidence bound
algorithm, and run a forward and backward pass only on these activated
channels, hence learning their parameters. Consequently, we enable the
efficient discovery of compact models. We validate our approach empirically on
state-of-the-art CNNs - VGGNet, ResNet and DenseNet, and on several image
classification datasets. Results demonstrate our framework for dynamic channel
execution reduces computational cost up to 4x and parameter count up to 9x,
thus reducing the memory and computational demands for discovering and training
compact neural network models
Exploiting Channel Similarity for Accelerating Deep Convolutional Neural Networks
To address the limitations of existing magnitude-based pruning algorithms in
cases where model weights or activations are of large and similar magnitude, we
propose a novel perspective to discover parameter redundancy among channels and
accelerate deep CNNs via channel pruning. Precisely, we argue that channels
revealing similar feature information have functional overlap and that most
channels within each such similarity group can be removed without compromising
model's representational power. After deriving an effective metric for
evaluating channel similarity through probabilistic modeling, we introduce a
pruning algorithm via hierarchical clustering of channels. In particular, the
proposed algorithm does not rely on sparsity training techniques or complex
data-driven optimization and can be directly applied to pre-trained models.
Extensive experiments on benchmark datasets strongly demonstrate the superior
acceleration performance of our approach over prior arts. On ImageNet, our
pruned ResNet-50 with 30% FLOPs reduced outperforms the baseline model.Comment: 14 pages, 6 figure
Balanced Sparsity for Efficient DNN Inference on GPU
In trained deep neural networks, unstructured pruning can reduce redundant
weights to lower storage cost. However, it requires the customization of
hardwares to speed up practical inference. Another trend accelerates sparse
model inference on general-purpose hardwares by adopting coarse-grained
sparsity to prune or regularize consecutive weights for efficient computation.
But this method often sacrifices model accuracy. In this paper, we propose a
novel fine-grained sparsity approach, balanced sparsity, to achieve high model
accuracy with commercial hardwares efficiently. Our approach adapts to high
parallelism property of GPU, showing incredible potential for sparsity in the
widely deployment of deep learning services. Experiment results show that
balanced sparsity achieves up to 3.1x practical speedup for model inference on
GPU, while retains the same high model accuracy as fine-grained sparsity
Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices
We introduce hybrid pruning which combines both coarse-grained channel and
fine-grained weight pruning to reduce model size, computation and power demands
with no to little loss in accuracy for enabling modern networks deployment on
resource-constrained devices, such as always-on security cameras and drones.
Additionally, to effectively perform channel pruning, we propose a fast
sensitivity test that helps us quickly identify the sensitivity of within and
across layers of a network to the output accuracy for target multiplier
accumulators (MACs) or accuracy tolerance. Our experiment shows significantly
better results on ResNet50 on ImageNet compared to existing work, even with an
additional constraint of channels be hardware-friendly number
Joint Multi-Dimension Pruning
We present joint multi-dimension pruning (named as JointPruning), a new
perspective of pruning a network on three crucial aspects: spatial, depth and
channel simultaneously. The joint strategy enables to search a better status
than previous studies that focused on individual dimension solely, as our
method is optimized collaboratively across the three dimensions in a single
end-to-end training. Moreover, each dimension that we consider can promote to
get better performance through colluding with the other two. Our method is
realized by the adapted stochastic gradient estimation. Extensive experiments
on large-scale ImageNet dataset across a variety of network architectures
MobileNet V1&V2 and ResNet demonstrate the effectiveness of our proposed
method. For instance, we achieve significant margins of 2.5% and 2.6%
improvement over the state-of-the-art approach on the already compact MobileNet
V1&V2 under an extremely large compression ratio
- …