288 research outputs found

    Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

    Full text link
    Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to- end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.Comment: accepted to ICLR 2018, 11 page

    Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration

    Full text link
    Previous works utilized ''smaller-norm-less-important'' criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with ''relatively less'' importance. When applied to two image classification benchmarks, our method validates its usefulness and strengths. Notably, on CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69% relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than 42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/filter-pruning-geometric-medianComment: Accepted to CVPR 2019 (Oral

    A Closer Look at Structured Pruning for Neural Network Compression

    Full text link
    Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network. However, the efficacy of structured pruning has largely evaded scrutiny. In this paper, we examine ResNets and DenseNets obtained through structured pruning-and-tuning and make two interesting observations: (i) reduced networks---smaller versions of the original network trained from scratch---consistently outperform pruned networks; (ii) if one takes the architecture of a pruned network and then trains it from scratch it is significantly more competitive. Furthermore, these architectures are easy to approximate: we can prune once and obtain a family of new, scalable network architectures that can simply be trained from scratch. Finally, we compare the inference speed of reduced and pruned networks on hardware, and show that reduced networks are significantly faster. Code is available at https://github.com/BayesWatch/pytorch-prunes.Comment: Preprint. First two authors contributed equally. Paper title has change

    Distilling with Performance Enhanced Students

    Full text link
    The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into question. In this paper, we turn to distillation for model compression---specifically, attention transfer---and develop a simple method for discovering performance enhanced student networks. We combine channel saliency metrics with empirical observations of runtime performance to design more accurate networks for a given latency budget. We apply our methodology to residual and densely-connected networks, and show that we are able to find resource-efficient student networks on different hardware platforms while maintaining very high accuracy. These performance-enhanced student networks achieve up to 10% boosts in top-1 ImageNet accuracy over their channel-pruned counterparts for the same inference time.Comment: Preprint. Paper title has change

    Really should we pruning after model be totally trained? Pruning based on a small amount of training

    Full text link
    Pre-training of models in pruning algorithms plays an important role in pruning decision-making. We find that excessive pre-training is not necessary for pruning algorithms. According to this idea, we propose a pruning algorithm---Incremental pruning based on less training (IPLT). Compared with the traditional pruning algorithm based on a large number of pre-training, IPLT has competitive compression effect than the traditional pruning algorithm under the same simple pruning strategy. On the premise of ensuring accuracy, IPLT can achieve 8x-9x compression for VGG-19 on CIFAR-10 and only needs to pre-train few epochs. For VGG-19 on CIFAR-10, we can not only achieve 10 times test acceleration, but also about 10 times training acceleration. At present, the research mainly focuses on the compression and acceleration in the application stage of the model, while the compression and acceleration in the training stage are few. We newly proposed a pruning algorithm that can compress and accelerate in the training stage. It is novel to consider the amount of pre-training required by pruning algorithm. Our results have implications: Too much pre-training may be not necessary for pruning algorithms

    AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

    Full text link
    We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods. Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2 (301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNet-v1 (569M FLOPs). Code and models will be available at: https://github.com/JiahuiYu/slimmable_networksComment: tech repor

    Dynamic Channel Pruning: Feature Boosting and Suppression

    Full text link
    Making deep convolutional neural networks more accurate typically comes at the cost of increased computational and memory resources. In this paper, we reduce this cost by exploiting the fact that the importance of features computed by convolutional layers is highly input-dependent, and propose feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time. FBS introduces small auxiliary connections to existing convolutional layers. In contrast to channel pruning methods which permanently remove channels, it preserves the full network structures and accelerates convolution by dynamically skipping unimportant input and output channels. FBS-augmented networks are trained with conventional stochastic gradient descent, making it readily available for many state-of-the-art CNNs. We compare FBS to a range of existing channel pruning and dynamic execution schemes and demonstrate large improvements on ImageNet classification. Experiments show that FBS can respectively provide 5×5\times and 2×2\times savings in compute on VGG-16 and ResNet-18, both with less than 0.6%0.6\% top-5 accuracy loss.Comment: 14 pages, 5 figures, 4 tables, published as a conference paper at ICLR 201

    Channel Pruning via Optimal Thresholding

    Full text link
    Structured pruning, especially channel pruning is widely used for the reduced computational cost and the compatibility with off-the-shelf hardware devices. Among existing works, weights are typically removed using a predefined global threshold, or a threshold computed from a predefined metric. The predefined global threshold based designs ignore the variation among different layers and weights distribution, therefore, they may often result in sub-optimal performance caused by over-pruning or under-pruning. In this paper, we present a simple yet effective method, termed Optimal Thresholding (OT), to prune channels with layer dependent thresholds that optimally separate important from negligible channels. By using OT, most negligible or unimportant channels are pruned to achieve high sparsity while minimizing performance degradation. Since most important weights are preserved, the pruned model can be further fine-tuned and quickly converge with very few iterations. Our method demonstrates superior performance, especially when compared to the state-of-the-art designs at high levels of sparsity. On CIFAR-100, a pruned and fine-tuned DenseNet-121 by using OT achieves 75.99% accuracy with only 1.46e8 FLOPs and 0.71M parameters.Comment: ICONIP 202

    Functionality-Oriented Convolutional Filter Pruning

    Full text link
    The sophisticated structure of Convolutional Neural Network (CNN) allows for outstanding performance, but at the cost of intensive computation. As significant redundancies inevitably present in such a structure, many works have been proposed to prune the convolutional filters for computation cost reduction. Although extremely effective, most works are based only on quantitative characteristics of the convolutional filters, and highly overlook the qualitative interpretation of individual filter's specific functionality. In this work, we interpreted the functionality and redundancy of the convolutional filters from different perspectives, and proposed a functionality-oriented filter pruning method. With extensive experiment results, we proved the convolutional filters' qualitative significance regardless of magnitude, demonstrated significant neural network redundancy due to repetitive filter functions, and analyzed the filter functionality defection under inappropriate retraining process. Such an interpretable pruning approach not only offers outstanding computation cost optimization over previous filter pruning methods, but also interprets filter pruning process

    Rethinking the Value of Network Pruning

    Full text link
    Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a certain criterion, redundant weights are pruned and important weights are kept to best preserve the accuracy. In this work, we make several surprising observations which contradict common beliefs. For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights. For pruning algorithms which assume a predefined target network architecture, one can get rid of the full pipeline and directly train the target network from scratch. Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm. Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods. We also compare with the "Lottery Ticket Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization.Comment: ICLR 2019. Significant revisions from the previous versio
    corecore