Search CORE

1,018 research outputs found

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration

Author: Hu Haoji
Wang Huan
Wang Yuehai
Zhang Qiming
Publication venue
Publication date: 09/09/2018
Field of study

In this paper, we propose a novel progressive parameter pruning method for Convolutional Neural Network acceleration, named Structured Probabilistic Pruning (SPP), which effectively prunes weights of convolutional layers in a probabilistic manner. Unlike existing deterministic pruning approaches, where unimportant weights are permanently eliminated, SPP introduces a pruning probability for each weight, and pruning is guided by sampling from the pruning probabilities. A mechanism is designed to increase and decrease pruning probabilities based on importance criteria in the training process. Experiments show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet classification. Moreover, SPP can be directly applied to accelerate multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We further show the effectiveness of SPP on transfer learning tasks.Comment: CNN model acceleration, 13 pages, 6 figures, accepted by Proceedings of the British Machine Vision Conference (BMVC), 2018 ora

arXiv.org e-Print Archive

Structured Pruning for Efficient ConvNets via Incremental Regularization

Author: Hu Haoji
Lu Yu
Wang Huan
Wang Yuehai
Zhang Qiming
Publication venue
Publication date: 15/04/2019
Field of study

Parameter pruning is a promising approach for CNN compression and acceleration by eliminating redundant model parameters with tolerable performance degrade. Despite its effectiveness, existing regularization-based parameter pruning methods usually drive weights towards zero with large and constant regularization factors, which neglects the fragility of the expressiveness of CNNs, and thus calls for a more gentle regularization scheme so that the networks can adapt during pruning. To achieve this, we propose a new and novel regularization-based pruning method, named IncReg, to incrementally assign different regularization factors to different weights based on their relative importance. Empirical analysis on CIFAR-10 dataset verifies the merits of IncReg. Further extensive experiments with popular CNNs on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even better results compared with state-of-the-arts. Our source codes and trained models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201

arXiv.org e-Print Archive

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Author: Ashukha Arsenii
Molchanov Dmitry
Neklyudov Kirill
Vetrov Dmitry
Publication venue
Publication date: 04/11/2017
Field of study

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer

arXiv.org e-Print Archive

Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method

Author: Hu Haoji
Luo Yang
Quek Tony Q. S.
Shan Hangguan
Wang Huan
Yu Lu
Zhang Yuxin
Publication venue
Publication date: 19/05/2019
Field of study

Despite enjoying extensive applications in video analysis, three-dimensional convolutional neural networks (3D CNNs)are restricted by their massive computation and storage consumption. To solve this problem, we propose a threedimensional regularization-based neural network pruning method to assign different regularization parameters to different weight groups based on their importance to the network. Further we analyze the redundancy and computation cost for each layer to determine the different pruning ratios. Experiments show that pruning based on our method can lead to 2x theoretical speedup with only 0.41% accuracy loss for 3DResNet18 and 3.28% accuracy loss for C3D. The proposed method performs favorably against other popular methods for model compression and acceleration.Comment: ICIP 201

arXiv.org e-Print Archive

Towards Efficient Model Compression via Learned Global Ranking

Author: Chin Ting-Wu
Ding Ruizhou
Marculescu Diana
Zhang Cha
Publication venue
Publication date: 14/03/2020
Field of study

Pruning convolutional filters has demonstrated its effectiveness in compressing ConvNets. Prior art in filter pruning requires users to specify a target model complexity (e.g., model size or FLOP count) for the resulting architecture. However, determining a target model complexity can be difficult for optimizing various embodied AI applications such as autonomous robots, drones, and user-facing applications. First, both the accuracy and the speed of ConvNets can affect the performance of the application. Second, the performance of the application can be hard to assess without evaluating ConvNets during inference. As a consequence, finding a sweet-spot between the accuracy and speed via filter pruning, which needs to be done in a trial-and-error fashion, can be time-consuming. This work takes a first step toward making this process more efficient by altering the goal of model compression to producing a set of ConvNets with various accuracy and latency trade-offs instead of producing one ConvNet targeting some pre-defined latency constraint. To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. Our proposed algorithm, LeGR, is shown to be 2x to 3x faster than prior work while having comparable or better performance when targeting seven pruned ResNet-56 with different accuracy/FLOPs profiles on the CIFAR-100 dataset. Additionally, we have evaluated LeGR on ImageNet and Bird-200 with ResNet-50 and MobileNetV2 to demonstrate its effectiveness. Code available at https://github.com/cmu-enyac/LeGR.Comment: CVPR 2020 Ora

arXiv.org e-Print Archive

Dynamic Neural Network Channel Execution for Efficient Training

Author: Lio Pietro
Spasov Simeon E.
Publication venue
Publication date: 15/05/2019
Field of study

Existing methods for reducing the computational burden of neural networks at run-time, such as parameter pruning or dynamic computational path selection, focus solely on improving computational efficiency during inference. On the other hand, in this work, we propose a novel method which reduces the memory footprint and number of computing operations required for training and inference. Our framework efficiently integrates pruning as part of the training procedure by exploring and tracking the relative importance of convolutional channels. At each training step, we select only a subset of highly salient channels to execute according to the combinatorial upper confidence bound algorithm, and run a forward and backward pass only on these activated channels, hence learning their parameters. Consequently, we enable the efficient discovery of compact models. We validate our approach empirically on state-of-the-art CNNs - VGGNet, ResNet and DenseNet, and on several image classification datasets. Results demonstrate our framework for dynamic channel execution reduces computational cost up to 4x and parameter count up to 9x, thus reducing the memory and computational demands for discovering and training compact neural network models

arXiv.org e-Print Archive

Exploiting Channel Similarity for Accelerating Deep Convolutional Neural Networks

Author: Deng Haoran
Ni Bingbing
Zhang Jian
Zhang Yunxiang
Zhao Chenglong
Publication venue
Publication date: 06/08/2019
Field of study

To address the limitations of existing magnitude-based pruning algorithms in cases where model weights or activations are of large and similar magnitude, we propose a novel perspective to discover parameter redundancy among channels and accelerate deep CNNs via channel pruning. Precisely, we argue that channels revealing similar feature information have functional overlap and that most channels within each such similarity group can be removed without compromising model's representational power. After deriving an effective metric for evaluating channel similarity through probabilistic modeling, we introduce a pruning algorithm via hierarchical clustering of channels. In particular, the proposed algorithm does not rely on sparsity training techniques or complex data-driven optimization and can be directly applied to pre-trained models. Extensive experiments on benchmark datasets strongly demonstrate the superior acceleration performance of our approach over prior arts. On ImageNet, our pruned ResNet-50 with 30% FLOPs reduced outperforms the baseline model.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

Balanced Sparsity for Efficient DNN Inference on GPU

Author: Cao Shijie
Nie Lanshun
Xiao Wencong
Yao Zhuliang
Zhang Chen
Publication venue
Publication date: 12/12/2018
Field of study

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, balanced sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that balanced sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity

arXiv.org e-Print Archive

Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices

Author: Brick Cormac
Park Mi Sun
Xu Xiaofan
Publication venue
Publication date: 01/11/2018
Field of study

We introduce hybrid pruning which combines both coarse-grained channel and fine-grained weight pruning to reduce model size, computation and power demands with no to little loss in accuracy for enabling modern networks deployment on resource-constrained devices, such as always-on security cameras and drones. Additionally, to effectively perform channel pruning, we propose a fast sensitivity test that helps us quickly identify the sensitivity of within and across layers of a network to the output accuracy for target multiplier accumulators (MACs) or accuracy tolerance. Our experiment shows significantly better results on ResNet50 on ImageNet compared to existing work, even with an additional constraint of channels be hardware-friendly number

arXiv.org e-Print Archive

Joint Multi-Dimension Pruning

Author: Cheng Kwang-Ting
Li Zhe
Liu Zechun
Shen Zhiqiang
Sun Jian
Wei Yichen
Zhang Xiangyu
Publication venue
Publication date: 18/05/2020
Field of study

We present joint multi-dimension pruning (named as JointPruning), a new perspective of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. The joint strategy enables to search a better status than previous studies that focused on individual dimension solely, as our method is optimized collaboratively across the three dimensions in a single end-to-end training. Moreover, each dimension that we consider can promote to get better performance through colluding with the other two. Our method is realized by the adapted stochastic gradient estimation. Extensive experiments on large-scale ImageNet dataset across a variety of network architectures MobileNet V1&V2 and ResNet demonstrate the effectiveness of our proposed method. For instance, we achieve significant margins of 2.5% and 2.6% improvement over the state-of-the-art approach on the already compact MobileNet V1&V2 under an extremely large compression ratio

arXiv.org e-Print Archive