23 research outputs found
Layer-compensated Pruning for Resource-constrained Convolutional Neural Networks
Resource-efficient convolution neural networks enable not only the
intelligence on edge devices but also opportunities in system-level
optimization such as scheduling. In this work, we aim to improve the
performance of resource-constrained filter pruning by merging two sub-problems
commonly considered, i.e., (i) how many filters to prune for each layer and
(ii) which filters to prune given a per-layer pruning budget, into a global
filter ranking problem. Our framework entails a novel algorithm, dubbed
layer-compensated pruning, where meta-learning is involved to determine better
solutions. We show empirically that the proposed algorithm is superior to prior
art in both effectiveness and efficiency. Specifically, we reduce the accuracy
gap between the pruned and original networks from 0.9% to 0.7% with 8x
reduction in time needed for meta-learning, i.e., from 1 hour down to 7
minutes. To this end, we demonstrate the effectiveness of our algorithm using
ResNet and MobileNetV2 networks under CIFAR-10, ImageNet, and Bird-200
datasets.Comment: 11 pages, 8 figures, work in progres
Hybrid Pruning: Thinner Sparse Networks for Fast Inference on Edge Devices
We introduce hybrid pruning which combines both coarse-grained channel and
fine-grained weight pruning to reduce model size, computation and power demands
with no to little loss in accuracy for enabling modern networks deployment on
resource-constrained devices, such as always-on security cameras and drones.
Additionally, to effectively perform channel pruning, we propose a fast
sensitivity test that helps us quickly identify the sensitivity of within and
across layers of a network to the output accuracy for target multiplier
accumulators (MACs) or accuracy tolerance. Our experiment shows significantly
better results on ResNet50 on ImageNet compared to existing work, even with an
additional constraint of channels be hardware-friendly number
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours
Can we automatically design a Convolutional Network (ConvNet) with the
highest image classification accuracy under the runtime constraint of a mobile
device? Neural architecture search (NAS) has revolutionized the design of
hardware-efficient ConvNets by automating this process. However, the NAS
problem remains challenging due to the combinatorially large design space,
causing a significant searching time (at least 200 GPU-hours). To alleviate
this complexity, we propose Single-Path NAS, a novel differentiable NAS method
for designing hardware-efficient ConvNets in less than 4 hours. Our
contributions are as follows: 1. Single-path search space: Compared to previous
differentiable NAS methods, Single-Path NAS uses one single-path
over-parameterized ConvNet to encode all architectural decisions with shared
convolutional kernel parameters, hence drastically decreasing the number of
trainable parameters and the search cost down to few epochs. 2.
Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96%
top-1 accuracy on ImageNet with 79ms latency on a Pixel 1 phone, which is
state-of-the-art accuracy compared to NAS methods with similar constraints
(<80ms). 3. NAS efficiency: Single-Path NAS search cost is only 8 epochs (30
TPU-hours), which is up to 5,000x faster compared to prior work. 4.
Reproducibility: Unlike all recent mobile-efficient NAS methods which only
release pretrained models, we open-source our entire codebase at:
https://github.com/dstamoulis/single-path-nas
Data-Driven Neuron Allocation for Scale Aggregation Networks
Successful visual recognition networks benefit from aggregating information
spanning from a wide range of scales. Previous research has investigated
information fusion of connected layers or multiple branches in a block, seeking
to strengthen the power of multi-scale representations. Despite their great
successes, existing practices often allocate the neurons for each scale
manually, and keep the same ratio in all aggregation blocks of an entire
network, rendering suboptimal performance. In this paper, we propose to learn
the neuron allocation for aggregating multi-scale information in different
building blocks of a deep network. The most informative output neurons in each
block are preserved while others are discarded, and thus neurons for multiple
scales are competitively and adaptively allocated. Our scale aggregation
network (ScaleNet) is constructed by repeating a scale aggregation (SA) block
that concatenates feature maps at a wide range of scales. Feature maps for each
scale are generated by a stack of downsampling, convolution and upsampling
operations. The data-driven neuron allocation and SA block achieve strong
representational power at the cost of considerably low computational
complexity. The proposed ScaleNet, by replacing all 3x3 convolutions in ResNet
with our SA blocks, achieves better performance than ResNet and its outstanding
variants like ResNeXt and SE-ResNet, in the same computational complexity. On
ImageNet classification, ScaleNets absolutely reduce the top-1 error rate of
ResNets by 1.12 (101 layers) and 1.82 (50 layers). On COCO object detection,
ScaleNets absolutely improve the mmAP with backbone of ResNets by 3.6 (101
layers) and 4.6 (50 layers) on Faster RCNN, respectively. Code and models are
released at https://github.com/Eli-YiLi/ScaleNet.Comment: 11 pages
Meta Filter Pruning to Accelerate Deep Convolutional Neural Networks
Existing methods usually utilize pre-defined criterions, such as p-norm, to
prune unimportant filters. There are two major limitations in these methods.
First, the relations of the filters are largely ignored. The filters usually
work jointly to make an accurate prediction in a collaborative way. Similar
filters will have equivalent effects on the network prediction, and the
redundant filters can be further pruned. Second, the pruning criterion remains
unchanged during training. As the network updated at each iteration, the filter
distribution also changes continuously. The pruning criterions should also be
adaptively switched. In this paper, we propose Meta Filter Pruning (MFP) to
solve the above problems. First, as a complement to the existing p-norm
criterion, we introduce a new pruning criterion considering the filter relation
via filter distance. Additionally, we build a meta pruning framework for filter
pruning, so that our method could adaptively select the most appropriate
pruning criterion as the filter distribution changes. Experiments validate our
approach on two image classification benchmarks. Notably, on ILSVRC-2012, our
MFP reduces more than 50% FLOPs on ResNet-50 with only 0.44% top-5 accuracy
loss.Comment: 10 page
OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks
Channel pruning can significantly accelerate and compress deep neural
networks. Many channel pruning works utilize structured sparsity regularization
to zero out all the weights in some channels and automatically obtain
structure-sparse network in training stage. However, these methods apply
structured sparsity regularization on each layer separately where the
correlations between consecutive layers are omitted. In this paper, we first
combine one out-channel in current layer and the corresponding in-channel in
next layer as a regularization group, namely out-in-channel. Our proposed
Out-In-Channel Sparsity Regularization (OICSR) considers correlations between
successive layers to further retain predictive power of the compact network.
Training with OICSR thoroughly transfers discriminative features into a
fraction of out-in-channels. Correspondingly, OICSR measures channel importance
based on statistics computed from two consecutive layers, not individual layer.
Finally, a global greedy pruning algorithm is designed to remove redundant
out-in-channels in an iterative way. Our method is comprehensively evaluated
with various CNN architectures including CifarNet, AlexNet, ResNet, DenseNet
and PreActSeNet on CIFAR-10, CIFAR-100 and ImageNet-1K datasets. Notably, on
ImageNet-1K, we reduce 37.2% FLOPs on ResNet-50 while outperforming the
original model by 0.22% top-1 accuracy.Comment: Accepted to CVPR 2019, the pruned ResNet-50 model has be released at:
https://github.com/dsfour/OICS
Parameterized Structured Pruning for Deep Neural Networks
As a result of the growing size of Deep Neural Networks (DNNs), the gap to
hardware capabilities in terms of memory and compute increases. To effectively
compress DNNs, quantization and connection pruning are usually considered.
However, unconstrained pruning usually leads to unstructured parallelism, which
maps poorly to massively parallel processors, and substantially reduces the
efficiency of general-purpose processors. Similar applies to quantization,
which often requires dedicated hardware. We propose Parameterized Structured
Pruning (PSP), a novel method to dynamically learn the shape of DNNs through
structured sparsity. PSP parameterizes structures (e.g. channel- or layer-wise)
in a weight tensor and leverages weight decay to learn a clear distinction
between important and unimportant structures. As a result, PSP maintains
prediction performance, creates a substantial amount of sparsity that is
structured and, thus, easy and efficient to map to a variety of massively
parallel processors, which are mandatory for utmost compute power and energy
efficiency. PSP is experimentally validated on the popular CIFAR10/100 and
ILSVRC2012 datasets using ResNet and DenseNet architectures, respectively
Filter Pruning using Hierarchical Group Sparse Regularization for Deep Convolutional Neural Networks
Since the convolutional neural networks are often trained with redundant
parameters, it is possible to reduce redundant kernels or filters to obtain a
compact network without dropping the classification accuracy. In this paper, we
propose a filter pruning method using the hierarchical group sparse
regularization. It is shown in our previous work that the hierarchical group
sparse regularization is effective in obtaining sparse networks in which
filters connected to unnecessary channels are automatically close to zero.
After training the convolutional neural network with the hierarchical group
sparse regularization, the unnecessary filters are selected based on the
increase of the classification loss of the randomly selected training samples
to obtain a compact network. It is shown that the proposed method can reduce
more than 50% parameters of ResNet for CIFAR-10 with only 0.3% decrease in the
accuracy of test samples. Also, 34% parameters of ResNet are reduced for
TinyImageNet-200 with higher accuracy than the baseline network.Comment: Accepted to ICPR 202
New Directions in Distributed Deep Learning: Bringing the Network at Forefront of IoT Design
In this paper, we first highlight three major challenges to large-scale
adoption of deep learning at the edge: (i) Hardware-constrained IoT devices,
(ii) Data security and privacy in the IoT era, and (iii) Lack of network-aware
deep learning algorithms for distributed inference across multiple IoT devices.
We then provide a unified view targeting three research directions that
naturally emerge from the above challenges: (1) Federated learning for training
deep networks, (2) Data-independent deployment of learning algorithms, and (3)
Communication-aware distributed inference. We believe that the above research
directions need a network-centric approach to enable the edge intelligence and,
therefore, fully exploit the true potential of IoT.Comment: This preprint is for personal use only. The official article will
appear in proceedings of Design Automation Conference (DAC), 2020. This work
was presented at the DAC 2020 special session on Edge-to-Cloud Neural
Networks for Machine Learning Applications in Future IoT System
Dynamic Sparse Graph for Efficient Deep Learning
We propose to execute deep neural networks (DNNs) with dynamic and sparse
graph (DSG) structure for compressive memory and accelerative execution during
both training and inference. The great success of DNNs motivates the pursuing
of lightweight models for the deployment onto embedded devices. However, most
of the previous studies optimize for inference while neglect training or even
complicate it. Training is far more intractable, since (i) the neurons dominate
the memory cost rather than the weights in inference; (ii) the dynamic
activation makes previous sparse acceleration via one-off optimization on fixed
weight invalid; (iii) batch normalization (BN) is critical for maintaining
accuracy while its activation reorganization damages the sparsity. To address
these issues, DSG activates only a small amount of neurons with high
selectivity at each iteration via a dimension-reduction search (DRS) and
obtains the BN compatibility via a double-mask selection (DMS). Experiments
show significant memory saving (1.7-4.5x) and operation reduction (2.3-4.4x)
with little accuracy loss on various benchmarks.Comment: ICLR 201