1,176 research outputs found
Dynamic Channel Pruning: Feature Boosting and Suppression
Making deep convolutional neural networks more accurate typically comes at
the cost of increased computational and memory resources. In this paper, we
reduce this cost by exploiting the fact that the importance of features
computed by convolutional layers is highly input-dependent, and propose feature
boosting and suppression (FBS), a new method to predictively amplify salient
convolutional channels and skip unimportant ones at run-time. FBS introduces
small auxiliary connections to existing convolutional layers. In contrast to
channel pruning methods which permanently remove channels, it preserves the
full network structures and accelerates convolution by dynamically skipping
unimportant input and output channels. FBS-augmented networks are trained with
conventional stochastic gradient descent, making it readily available for many
state-of-the-art CNNs. We compare FBS to a range of existing channel pruning
and dynamic execution schemes and demonstrate large improvements on ImageNet
classification. Experiments show that FBS can respectively provide
and savings in compute on VGG-16 and ResNet-18, both with less than
top-5 accuracy loss.Comment: 14 pages, 5 figures, 4 tables, published as a conference paper at
ICLR 201
Target Aware Network Adaptation for Efficient Representation Learning
This paper presents an automatic network adaptation method that finds a
ConvNet structure well-suited to a given target task, e.g., image
classification, for efficiency as well as accuracy in transfer learning. We
call the concept target-aware transfer learning. Given only small-scale labeled
data, and starting from an ImageNet pre-trained network, we exploit a scheme of
removing its potential redundancy for the target task through iterative
operations of filter-wise pruning and network optimization. The basic
motivation is that compact networks are on one hand more efficient and should
also be more tolerant, being less complex, against the risk of overfitting
which would hinder the generalization of learned representations in the context
of transfer learning. Further, unlike existing methods involving network
simplification, we also let the scheme identify redundant portions across the
entire network, which automatically results in a network structure adapted to
the task at hand. We achieve this with a few novel ideas: (i) cumulative sum of
activation statistics for each layer, and (ii) a priority evaluation of pruning
across multiple layers. Experimental results by the method on five datasets
(Flower102, CUB200-2011, Dog120, MIT67, and Stanford40) show favorable
accuracies over the related state-of-the-art techniques while enhancing the
computational and storage efficiency of the transferred model.Comment: Accepted by the ECCV'18 Workshops (2nd International Workshop on
Compact and Efficient Feature Representation and Learning in Computer Vision
PruneNet: Channel Pruning via Global Importance
Channel pruning is one of the predominant approaches for accelerating deep
neural networks. Most existing pruning methods either train from scratch with a
sparsity inducing term such as group lasso, or prune redundant channels in a
pretrained network and then fine tune the network. Both strategies suffer from
some limitations: the use of group lasso is computationally expensive,
difficult to converge and often suffers from worse behavior due to the
regularization bias. The methods that start with a pretrained network either
prune channels uniformly across the layers or prune channels based on the basic
statistics of the network parameters. These approaches either ignore the fact
that some CNN layers are more redundant than others or fail to adequately
identify the level of redundancy in different layers. In this work, we
investigate a simple-yet-effective method for pruning channels based on a
computationally light-weight yet effective data driven optimization step that
discovers the necessary width per layer. Experiments conducted on ILSVRC-
confirm effectiveness of our approach. With non-uniform pruning across the
layers on ResNet-, we are able to match the FLOP reduction of
state-of-the-art channel pruning results while achieving a higher
accuracy. Further, we show that our pruned ResNet- network outperforms
ResNet- and ResNet- networks, and that our pruned ResNet-
outperforms ResNet-.Comment: 12 pages, 3 figures, Published in ICLR 2020 NAS Worksho
Multi-loss-aware Channel Pruning of Deep Networks
Channel pruning, which seeks to reduce the model size by removing redundant
channels, is a popular solution for deep networks compression. Existing channel
pruning methods usually conduct layer-wise channel selection by directly
minimizing the reconstruction error of feature maps between the baseline model
and the pruned one. However, they ignore the feature and semantic distributions
within feature maps and real contribution of channels to the overall
performance. In this paper, we propose a new channel pruning method by
explicitly using both intermediate outputs of the baseline model and the
classification loss of the pruned model to supervise layer-wise channel
selection. Particularly, we introduce an additional loss to encode the
differences in the feature and semantic distributions within feature maps
between the baseline model and the pruned one. By considering the
reconstruction error, the additional loss and the classification loss at the
same time, our approach can significantly improve the performance of the pruned
model. Comprehensive experiments on benchmark datasets demonstrate the
effectiveness of the proposed method.Comment: 4 pages, 2 figure
Localization-aware Channel Pruning for Object Detection
Channel pruning is one of the important methods for deep model compression.
Most of existing pruning methods mainly focus on classification. Few of them
conduct systematic research on object detection. However, object detection is
different from classification, which requires not only semantic information but
also localization information. In this paper, based on discrimination-aware
channel pruning (DCP) which is state-of-the-art pruning method for
classification, we propose a localization-aware auxiliary network to find out
the channels with key information for classification and regression so that we
can conduct channel pruning directly for object detection, which saves lots of
time and computing resources. In order to capture the localization information,
we first design the auxiliary network with a contextual ROIAlign layer which
can obtain precise localization information of the default boxes by pixel
alignment and enlarges the receptive fields of the default boxes when pruning
shallow layers. Then, we construct a loss function for object detection task
which tends to keep the channels that contain the key information for
classification and regression. Extensive experiments demonstrate the
effectiveness of our method. On MS COCO, we prune 70\% parameters of the SSD
based on ResNet-50 with modest accuracy drop, which outperforms
the-state-of-art method
Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error
Compression techniques for deep neural network models are becoming very
important for the efficient execution of high-performance deep learning systems
on edge-computing devices. The concept of model compression is also important
for analyzing the generalization error of deep learning, known as the
compression-based error bound. However, there is still huge gap between a
practically effective compression method and its rigorous background of
statistical learning theory. To resolve this issue, we develop a new
theoretical framework for model compression and propose a new pruning method
called {\it spectral pruning} based on this framework. We define the ``degrees
of freedom'' to quantify the intrinsic dimensionality of a model by using the
eigenvalue distribution of the covariance matrix across the internal nodes and
show that the compression ability is essentially controlled by this quantity.
Moreover, we present a sharp generalization error bound of the compressed model
and characterize the bias--variance tradeoff induced by the compression
procedure. We apply our method to several datasets to justify our theoretical
analyses and show the superiority of the the proposed method.Comment: 17 pages, 4 figures. Accepted in IJCAI-PRICAI 2020. Proceedings of
the Twenty-Ninth International Joint Conference on Artificial Intelligence,
pages 2839--284
A flexible, extensible software framework for model compression based on the LC algorithm
We propose a software framework based on the ideas of the
Learning-Compression (LC) algorithm, that allows a user to compress a neural
network or other machine learning model using different compression schemes
with minimal effort. Currently, the supported compressions include pruning,
quantization, low-rank methods (including automatically learning the layer
ranks), and combinations of those, and the user can choose different
compression types for different parts of a neural network.
The LC algorithm alternates two types of steps until convergence: a learning
(L) step, which trains a model on a dataset (using an algorithm such as SGD);
and a compression (C) step, which compresses the model parameters (using a
compression scheme such as low-rank or quantization). This decoupling of the
"machine learning" aspect from the "signal compression" aspect means that
changing the model or the compression type amounts to calling the corresponding
subroutine in the L or C step, respectively. The library fully supports this by
design, which makes it flexible and extensible. This does not come at the
expense of performance: the runtime needed to compress a model is comparable to
that of training the model in the first place; and the compressed model is
competitive in terms of prediction accuracy and compression ratio with other
algorithms (which are often specialized for specific models or compression
schemes). The library is written in Python and PyTorch and available in Github.Comment: 15 pages, 4 figures, 2 table
Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
Existing methods usually utilize pre-defined criterions, such as p-norm, to
prune unimportant filters. There are two major limitations in these methods.
First, the relations of the filters are largely ignored. The filters usually
work jointly to make an accurate prediction in a collaborative way. Similar
filters will have equivalent effects on the network prediction, and the
redundant filters can be further pruned. Second, the pruning criterion remains
unchanged during training. As the network updated at each iteration, the filter
distribution also changes continuously. The pruning criterions should also be
adaptively switched. In this paper, we propose Meta Filter Pruning (MFP) to
solve the above problems. First, as a complement to the existing p-norm
criterion, we introduce a new pruning criterion considering the filter relation
via filter distance. Additionally, we build a meta pruning framework for filter
pruning, so that our method could adaptively select the most appropriate
pruning criterion as the filter distribution changes. Experiments validate our
approach on two image classification benchmarks. Notably, on ILSVRC-2012, our
MFP reduces more than 50% FLOPs on ResNet-50 with only 0.44% top-5 accuracy
loss.Comment: 10 page
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
Previous works utilized ''smaller-norm-less-important'' criterion to prune
filters with smaller norm values in a convolutional neural network. In this
paper, we analyze this norm-based criterion and point out that its
effectiveness depends on two requirements that are not always met: (1) the norm
deviation of the filters should be large; (2) the minimum norm of the filters
should be small. To solve this problem, we propose a novel filter pruning
method, namely Filter Pruning via Geometric Median (FPGM), to compress the
model regardless of those two requirements. Unlike previous methods, FPGM
compresses CNN models by pruning filters with redundancy, rather than those
with ''relatively less'' importance. When applied to two image classification
benchmarks, our method validates its usefulness and strengths. Notably, on
CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69%
relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than
42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the
state-of-the-art. Code is publicly available on GitHub:
https://github.com/he-y/filter-pruning-geometric-medianComment: Accepted to CVPR 2019 (Oral
Parameterized Structured Pruning for Deep Neural Networks
As a result of the growing size of Deep Neural Networks (DNNs), the gap to
hardware capabilities in terms of memory and compute increases. To effectively
compress DNNs, quantization and connection pruning are usually considered.
However, unconstrained pruning usually leads to unstructured parallelism, which
maps poorly to massively parallel processors, and substantially reduces the
efficiency of general-purpose processors. Similar applies to quantization,
which often requires dedicated hardware. We propose Parameterized Structured
Pruning (PSP), a novel method to dynamically learn the shape of DNNs through
structured sparsity. PSP parameterizes structures (e.g. channel- or layer-wise)
in a weight tensor and leverages weight decay to learn a clear distinction
between important and unimportant structures. As a result, PSP maintains
prediction performance, creates a substantial amount of sparsity that is
structured and, thus, easy and efficient to map to a variety of massively
parallel processors, which are mandatory for utmost compute power and energy
efficiency. PSP is experimentally validated on the popular CIFAR10/100 and
ILSVRC2012 datasets using ResNet and DenseNet architectures, respectively
- …