688 research outputs found
OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
Channel pruning can significantly accelerate and compress deep neural
networks. Many channel pruning works utilize structured sparsity regularization
to zero out all the weights in some channels and automatically obtain
structure-sparse network in training stage. However, these methods apply
structured sparsity regularization on each layer separately where the
correlations between consecutive layers are omitted. In this paper, we first
combine one out-channel in current layer and the corresponding in-channel in
next layer as a regularization group, namely out-in-channel. Our proposed
Out-In-Channel Sparsity Regularization (OICSR) considers correlations between
successive layers to further retain predictive power of the compact network.
Training with OICSR thoroughly transfers discriminative features into a
fraction of out-in-channels. Correspondingly, OICSR measures channel importance
based on statistics computed from two consecutive layers, not individual layer.
Finally, a global greedy pruning algorithm is designed to remove redundant
out-in-channels in an iterative way. Our method is comprehensively evaluated
with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet
and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on
ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the
original model by 0.22% top-1 accuracy.Comment: Accepted to CVPR 2019, the pruned ResNet-50 model has be released at:
https://github.com/dsfour/OICS
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Model pruning has become a useful technique that improves the computational
efficiency of deep learning, making it possible to deploy solutions in
resource-limited scenarios. A widely-used practice in relevant work assumes
that a smaller-norm parameter or feature plays a less informative role at the
inference time. In this paper, we propose a channel pruning technique for
accelerating the computations of deep convolutional neural networks (CNNs) that
does not critically rely on this assumption. Instead, it focuses on direct
simplification of the channel-to-channel computation graph of a CNN without the
need of performing a computationally difficult and not-always-useful task of
making high-dimensional tensors of CNN structured sparse. Our approach takes
two stages: first to adopt an end-to- end stochastic training method that
eventually forces the outputs of some channels to be constant, and then to
prune those constant channels from the original neural network by adjusting the
biases of their impacting layers such that the resulting compact model can be
quickly fine-tuned. Our approach is mathematically appealing from an
optimization perspective and easy to reproduce. We experimented our approach
through several image learning benchmarks and demonstrate its interesting
aspects and competitive performance.Comment: accepted to ICLR 2018, 11 page
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
The redundancy is widely recognized in Convolutional Neural Networks (CNNs),
which enables to remove unimportant filters from convolutional layers so as to
slim the network with acceptable performance drop. Inspired by the linear and
combinational properties of convolution, we seek to make some filters
increasingly close and eventually identical for network slimming. To this end,
we propose Centripetal SGD (C-SGD), a novel optimization method, which can
train several filters to collapse into a single point in the parameter
hyperspace. When the training is completed, the removal of the identical
filters can trim the network with NO performance loss, thus no finetuning is
needed. By doing so, we have partly solved an open problem of constrained
filter pruning on CNNs with complicated structure, where some layers must be
pruned following others. Our experimental results on CIFAR-10 and ImageNet have
justified the effectiveness of C-SGD-based filter pruning. Moreover, we have
provided empirical evidences for the assumption that the redundancy in deep
neural networks helps the convergence of training by showing that a redundant
CNN trained using C-SGD outperforms a normally trained counterpart with the
equivalent width.Comment: CVPR 201
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
PruneNet: Channel Pruning via Global Importance
Channel pruning is one of the predominant approaches for accelerating deep
neural networks. Most existing pruning methods either train from scratch with a
sparsity inducing term such as group lasso, or prune redundant channels in a
pretrained network and then fine tune the network. Both strategies suffer from
some limitations: the use of group lasso is computationally expensive,
difficult to converge and often suffers from worse behavior due to the
regularization bias. The methods that start with a pretrained network either
prune channels uniformly across the layers or prune channels based on the basic
statistics of the network parameters. These approaches either ignore the fact
that some CNN layers are more redundant than others or fail to adequately
identify the level of redundancy in different layers. In this work, we
investigate a simple-yet-effective method for pruning channels based on a
computationally light-weight yet effective data driven optimization step that
discovers the necessary width per layer. Experiments conducted on ILSVRC-
confirm effectiveness of our approach. With non-uniform pruning across the
layers on ResNet-, we are able to match the FLOP reduction of
state-of-the-art channel pruning results while achieving a higher
accuracy. Further, we show that our pruned ResNet- network outperforms
ResNet- and ResNet- networks, and that our pruned ResNet-
outperforms ResNet-.Comment: 12 pages, 3 figures, Published in ICLR 2020 NAS Worksho
Discrimination-aware Channel Pruning for Deep Neural Networks
Channel pruning is one of the predominant approaches for deep model
compression. Existing pruning methods either train from scratch with sparsity
constraints on channels, or minimize the reconstruction error between the
pre-trained feature maps and the compressed ones. Both strategies suffer from
some limitations: the former kind is computationally expensive and difficult to
converge, whilst the latter kind optimizes the reconstruction error but ignores
the discriminative power of channels. To overcome these drawbacks, we
investigate a simple-yet-effective method, called discrimination-aware channel
pruning, to choose those channels that really contribute to discriminative
power. To this end, we introduce additional losses into the network to increase
the discriminative power of intermediate layers and then select the most
discriminative channels for each layer by considering the additional loss and
the reconstruction error. Last, we propose a greedy algorithm to conduct
channel selection and parameter optimization in an iterative way. Extensive
experiments demonstrate the effectiveness of our method. For example, on
ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms
the original model by 0.39% in top-1 accuracy.Comment: NeurIPS 201
Rethinking the Value of Network Pruning
Network pruning is widely used for reducing the heavy inference cost of deep
models in low-resource settings. A typical pruning algorithm is a three-stage
pipeline, i.e., training (a large model), pruning and fine-tuning. During
pruning, according to a certain criterion, redundant weights are pruned and
important weights are kept to best preserve the accuracy. In this work, we make
several surprising observations which contradict common beliefs. For all
state-of-the-art structured pruning algorithms we examined, fine-tuning a
pruned model only gives comparable or worse performance than training that
model with randomly initialized weights. For pruning algorithms which assume a
predefined target network architecture, one can get rid of the full pipeline
and directly train the target network from scratch. Our observations are
consistent for multiple network architectures, datasets, and tasks, which imply
that: 1) training a large, over-parameterized model is often not necessary to
obtain an efficient final model, 2) learned "important" weights of the large
model are typically not useful for the small pruned model, 3) the pruned
architecture itself, rather than a set of inherited "important" weights, is
more crucial to the efficiency in the final model, which suggests that in some
cases pruning can be useful as an architecture search paradigm. Our results
suggest the need for more careful baseline evaluations in future research on
structured pruning methods. We also compare with the "Lottery Ticket
Hypothesis" (Frankle & Carbin 2019), and find that with optimal learning rate,
the "winning ticket" initialization as used in Frankle & Carbin (2019) does not
bring improvement over random initialization.Comment: ICLR 2019. Significant revisions from the previous versio
DAC: Data-free Automatic Acceleration of Convolutional Networks
Deploying a deep learning model on mobile/IoT devices is a challenging task.
The difficulty lies in the trade-off between computation speed and accuracy. A
complex deep learning model with high accuracy runs slowly on resource-limited
devices, while a light-weight model that runs much faster loses accuracy. In
this paper, we propose a novel decomposition method, namely DAC, that is
capable of factorizing an ordinary convolutional layer into two layers with
much fewer parameters. DAC computes the corresponding weights for the newly
generated layers directly from the weights of the original convolutional layer.
Thus, no training (or fine-tuning) or any data is needed. The experimental
results show that DAC reduces a large number of floating-point operations
(FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy
drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on
ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007
dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft
COCO dataset. Compared to other existing decomposition methods, DAC achieves
better performance.Comment: Accepted by IEEE Winter Conference on Applications of Computer Vision
(WACV 2019
Deep Sparse Band Selection for Hyperspectral Face Recognition
Hyperspectral imaging systems collect and process information from specific
wavelengths across the electromagnetic spectrum. The fusion of multi-spectral
bands in the visible spectrum has been exploited to improve face recognition
performance over all the conventional broad band face images. In this book
chapter, we propose a new Convolutional Neural Network (CNN) framework which
adopts a structural sparsity learning technique to select the optimal spectral
bands to obtain the best face recognition performance over all of the spectral
bands. Specifically, in this method, images from all bands are fed to a CNN,
and the convolutional filters in the first layer of the CNN are then
regularized by employing a group Lasso algorithm to zero out the redundant
bands during the training of the network. Contrary to other methods which
usually select the useful bands manually or in a greedy fashion, our method
selects the optimal spectral bands automatically to achieve the best face
recognition performance over all spectral bands. Moreover, experimental results
demonstrate that our method outperforms state of the art band selection methods
for face recognition on several publicly-available hyperspectral face image
datasets
A One-step Pruning-recovery Framework for Acceleration of Convolutional Neural Networks
Acceleration of convolutional neural network has received increasing
attention during the past several years. Among various acceleration techniques,
filter pruning has its inherent merit by effectively reducing the number of
convolution filters. However, most filter pruning methods resort to tedious and
time-consuming layer-by-layer pruning-recovery strategy to avoid a significant
drop of accuracy. In this paper, we present an efficient filter pruning
framework to solve this problem. Our method accelerates the network in one-step
pruning-recovery manner with a novel optimization objective function, which
achieves higher accuracy with much less cost compared with existing pruning
methods. Furthermore, our method allows network compression with global filter
pruning. Given a global pruning rate, it can adaptively determine the pruning
rate for each single convolutional layer, while these rates are often set as
hyper-parameters in previous approaches. Evaluated on VGG-16 and ResNet-50
using ImageNet, our approach outperforms several state-of-the-art methods with
less accuracy drop under the same and even much fewer floating-point operations
(FLOPs)
- …