7 research outputs found
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration
Previous works utilized ''smaller-norm-less-important'' criterion to prune
filters with smaller norm values in a convolutional neural network. In this
paper, we analyze this norm-based criterion and point out that its
effectiveness depends on two requirements that are not always met: (1) the norm
deviation of the filters should be large; (2) the minimum norm of the filters
should be small. To solve this problem, we propose a novel filter pruning
method, namely Filter Pruning via Geometric Median (FPGM), to compress the
model regardless of those two requirements. Unlike previous methods, FPGM
compresses CNN models by pruning filters with redundancy, rather than those
with ''relatively less'' importance. When applied to two image classification
benchmarks, our method validates its usefulness and strengths. Notably, on
CIFAR-10, FPGM reduces more than 52% FLOPs on ResNet-110 with even 2.69%
relative accuracy improvement. Moreover, on ILSVRC-2012, FPGM reduces more than
42% FLOPs on ResNet-101 without top-5 accuracy drop, which has advanced the
state-of-the-art. Code is publicly available on GitHub:
https://github.com/he-y/filter-pruning-geometric-medianComment: Accepted to CVPR 2019 (Oral
Representation Based Complexity Measures for Predicting Generalization in Deep Learning
Deep Neural Networks can generalize despite being significantly
overparametrized. Recent research has tried to examine this phenomenon from
various view points and to provide bounds on the generalization error or
measures predictive of the generalization gap based on these viewpoints, such
as norm-based, PAC-Bayes based, and margin-based analysis. In this work, we
provide an interpretation of generalization from the perspective of quality of
internal representations of deep neural networks, based on neuroscientific
theories of how the human visual system creates invariant and untangled object
representations. Instead of providing theoretical bounds, we demonstrate
practical complexity measures which can be computed ad-hoc to uncover
generalization behaviour in deep models. We also provide a detailed description
of our solution that won the NeurIPS competition on Predicting Generalization
in Deep Learning held at NeurIPS 2020. An implementation of our solution is
available at https://github.com/parthnatekar/pgdl.Comment: Winning Solution of the NeurIPS 2020 Competition on Predicting
Generalization in Deep Learnin
Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks
Modern convolutional neural networks (CNNs) have massive identical
convolution blocks, and, hence, recursive sharing of parameters across these
blocks has been proposed to reduce the amount of parameters. However, naive
sharing of parameters poses many challenges such as limited representational
power and the vanishing/exploding gradients problem of recursively shared
parameters. In this paper, we present a recursive convolution block design and
training method, in which a recursively shareable part, or a filter basis, is
separated and learned while effectively avoiding the vanishing/exploding
gradients problem during training. We show that the unwieldy
vanishing/exploding gradients problem can be controlled by enforcing the
elements of the filter basis orthonormal, and empirically demonstrate that the
proposed orthogonality regularization improves the flow of gradients during
training. Experimental results on image classification and object detection
show that our approach, unlike previous parameter-sharing approaches, does not
trade performance to save parameters and consistently outperforms
overparameterized counterpart networks. This superior performance demonstrates
that the proposed recursive convolution block design and the orthogonality
regularization not only prevent performance degradation, but also consistently
improve the representation capability while a significant amount of parameters
are recursively shared
REPrune: Filter Pruning via Representative Election
Even though norm-based filter pruning methods are widely accepted, it is
questionable whether the "smaller-norm-less-important" criterion is optimal in
determining filters to prune. Especially when we can keep only a small fraction
of the original filters, it is more crucial to choose the filters that can best
represent the whole filters regardless of norm values. Our novel pruning method
entitled "REPrune" addresses this problem by selecting representative filters
via clustering. By selecting one filter from a cluster of similar filters and
avoiding selecting adjacent large filters, REPrune can achieve a better
compression rate with similar accuracy. Our method also recovers the accuracy
more rapidly and requires a smaller shift of filters during fine-tuning.
Empirically, REPrune reduces more than 49% FLOPs, with 0.53% accuracy gain on
ResNet-110 for CIFAR-10. Also, REPrune reduces more than 41.8% FLOPs with 1.67%
Top-1 validation loss on ResNet-18 for ImageNet.Comment: Under Review at ECCV 202
Exploiting Channel Similarity for Accelerating Deep Convolutional Neural Networks
To address the limitations of existing magnitude-based pruning algorithms in
cases where model weights or activations are of large and similar magnitude, we
propose a novel perspective to discover parameter redundancy among channels and
accelerate deep CNNs via channel pruning. Precisely, we argue that channels
revealing similar feature information have functional overlap and that most
channels within each such similarity group can be removed without compromising
model's representational power. After deriving an effective metric for
evaluating channel similarity through probabilistic modeling, we introduce a
pruning algorithm via hierarchical clustering of channels. In particular, the
proposed algorithm does not rely on sparsity training techniques or complex
data-driven optimization and can be directly applied to pre-trained models.
Extensive experiments on benchmark datasets strongly demonstrate the superior
acceleration performance of our approach over prior arts. On ImageNet, our
pruned ResNet-50 with 30% FLOPs reduced outperforms the baseline model.Comment: 14 pages, 6 figure
DHP: Differentiable Meta Pruning via HyperNetworks
Network pruning has been the driving force for the acceleration of neural
networks and the alleviation of model storage/transmission burden. With the
advent of AutoML and neural architecture search (NAS), pruning has become
topical with automatic mechanism and searching based architecture optimization.
Yet, current automatic designs rely on either reinforcement learning or
evolutionary algorithm. Due to the non-differentiability of those algorithms,
the pruning algorithm needs a long searching stage before reaching the
convergence.
To circumvent this problem, this paper introduces a differentiable pruning
method via hypernetworks for automatic network pruning. The specifically
designed hypernetworks take latent vectors as input and generate the weight
parameters of the backbone network. The latent vectors control the output
channels of the convolutional layers in the backbone network and act as a
handle for the pruning of the layers. By enforcing sparsity
regularization to the latent vectors and utilizing proximal gradient solver,
sparse latent vectors can be obtained. Passing the sparsified latent vectors
through the hypernetworks, the corresponding slices of the generated weight
parameters can be removed, achieving the effect of network pruning. The latent
vectors of all the layers are pruned together, resulting in an automatic layer
configuration. Extensive experiments are conducted on various networks for
image classification, single image super-resolution, and denoising. And the
experimental results validate the proposed method.Comment: ECCV camera-ready. Code is available at
https://github.com/ofsoundof/dh
Transform Quantization for CNN (Convolutional Neural Network) Compression
In this paper, we compress convolutional neural network (CNN) weights
post-training via transform quantization. Previous CNN quantization techniques
tend to ignore the joint statistics of weights and activations, producing
sub-optimal CNN performance at a given quantization bit-rate, or consider their
joint statistics during training only and do not facilitate efficient
compression of already trained CNN models. We optimally transform (decorrelate)
and quantize the weights post-training using a rate-distortion framework to
improve compression at any given quantization bit-rate. Transform quantization
unifies quantization and dimensionality reduction (decorrelation) techniques in
a single framework to facilitate low bit-rate compression of CNNs and efficient
inference in the transform domain. We first introduce a theory of rate and
distortion for CNN quantization, and pose optimum quantization as a
rate-distortion optimization problem. We then show that this problem can be
solved using optimal bit-depth allocation following decorrelation by the
optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments
demonstrate that transform quantization advances the state of the art in CNN
compression in both retrained and non-retrained quantization scenarios. In
particular, we find that transform quantization with retraining is able to
compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates
(1-2 bits).Comment: To appear in IEEE Trans Pattern Anal Mach Intel