174 research outputs found
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
Weight pruning methods for deep neural networks (DNNs) have been investigated
recently, but prior work in this area is mainly heuristic, iterative pruning,
thereby lacking guarantees on the weight reduction ratio and convergence time.
To mitigate these limitations, we present a systematic weight pruning framework
of DNNs using the alternating direction method of multipliers (ADMM). We first
formulate the weight pruning problem of DNNs as a nonconvex optimization
problem with combinatorial constraints specifying the sparsity requirements,
and then adopt the ADMM framework for systematic weight pruning. By using ADMM,
the original nonconvex optimization problem is decomposed into two subproblems
that are solved iteratively. One of these subproblems can be solved using
stochastic gradient descent, the other can be solved analytically. Besides, our
method achieves a fast convergence rate.
The weight pruning results are very promising and consistently outperform the
prior work. On the LeNet-5 model for the MNIST data set, we achieve 71.2 times
weight reduction without accuracy loss. On the AlexNet model for the ImageNet
data set, we achieve 21 times weight reduction without accuracy loss. When we
focus on the convolutional layer pruning for computation reductions, we can
reduce the total computation by five times compared with the prior work
(achieving a total of 13.4 times weight reduction in convolutional layers). Our
models and codes are released at https://github.com/KaiqiZhang/admm-prunin
A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM
Many model compression techniques of Deep Neural Networks (DNNs) have been
investigated, including weight pruning, weight clustering and quantization,
etc. Weight pruning leverages the redundancy in the number of weights in DNNs,
while weight clustering/quantization leverages the redundancy in the number of
bit representations of weights. They can be effectively combined in order to
exploit the maximum degree of redundancy. However, there lacks a systematic
investigation in literature towards this direction.
In this paper, we fill this void and develop a unified, systematic framework
of DNN weight pruning and clustering/quantization using Alternating Direction
Method of Multipliers (ADMM), a powerful technique in optimization theory to
deal with non-convex optimization problems. Both DNN weight pruning and
clustering/quantization, as well as their combinations, can be solved in a
unified manner. For further performance improvement in this framework, we adopt
multiple techniques including iterative weight quantization and retraining,
joint weight clustering training and centroid updating, weight clustering
retraining, etc. The proposed framework achieves significant improvements both
in individual weight pruning and clustering/quantization problems, as well as
their combinations. For weight pruning alone, we achieve 167x weight reduction
in LeNet-5, 24.7x in AlexNet, and 23.4x in VGGNet, without any accuracy loss.
For the combination of DNN weight pruning and clustering/quantization, we
achieve 1,910x and 210x storage reduction of weight data on LeNet-5 and
AlexNet, respectively, without accuracy loss. Our codes and models are released
at the link http://bit.ly/2D3F0n
ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers
To facilitate efficient embedded and hardware implementations of deep neural
networks (DNNs), two important categories of DNN model compression techniques:
weight pruning and weight quantization are investigated. The former leverages
the redundancy in the number of weights, whereas the latter leverages the
redundancy in bit representation of weights. However, there lacks a systematic
framework of joint weight pruning and quantization of DNNs, thereby limiting
the available model compression ratio. Moreover, the computation reduction,
energy efficiency improvement, and hardware performance overhead need to be
accounted for besides simply model size reduction.
To address these limitations, we present ADMM-NN, the first
algorithm-hardware co-optimization framework of DNNs using Alternating
Direction Method of Multipliers (ADMM), a powerful technique to deal with
non-convex optimization problems with possibly combinatorial constraints. The
first part of ADMM-NN is a systematic, joint framework of DNN weight pruning
and quantization using ADMM. It can be understood as a smart regularization
technique with regularization target dynamically updated in each ADMM
iteration, thereby resulting in higher performance in model compression than
prior work. The second part is hardware-aware DNN optimizations to facilitate
hardware-level implementations.
Without accuracy loss, we can achieve 85 and 24 pruning on
LeNet-5 and AlexNet models, respectively, significantly higher than prior work.
The improvement becomes more significant when focusing on computation
reductions. Combining weight pruning and quantization, we achieve 1,910
and 231 reductions in overall model size on these two benchmarks, when
focusing on data storage. Highly promising results are also observed on other
representative DNNs such as VGGNet and ResNet-50
StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs
Weight pruning methods of DNNs have been demonstrated to achieve a good model
pruning rate without loss of accuracy, thereby alleviating the significant
computation/storage requirements of large-scale DNNs. Structured weight pruning
methods have been proposed to overcome the limitation of irregular network
structure and demonstrated actual GPU acceleration. However, in prior work the
pruning rate (degree of sparsity) and GPU acceleration are limited (to less
than 50%) when accuracy needs to be maintained. In this work,we overcome these
limitations by proposing a unified, systematic framework of structured weight
pruning for DNNs. It is a framework that can be used to induce different types
of structured sparsity, such as filter-wise, channel-wise, and shape-wise
sparsity, as well non-structured sparsity. The proposed framework incorporates
stochastic gradient descent with ADMM, and can be understood as a dynamic
regularization method in which the regularization target is analytically
updated in each iteration. Without loss of accuracy on the AlexNet model, we
achieve 2.58X and 3.65X average measured speedup on two GPUs, clearly
outperforming the prior work. The average speedups reach 3.15X and 8.52X when
allowing a moderate ac-curacy loss of 2%. In this case the model compression
for convolutional layers is 15.0X, corresponding to 11.93X measured CPU
speedup. Our experiments on ResNet model and on other data sets like UCF101 and
CIFAR-10 demonstrate the consistently higher performance of our framework
Systematic Weight Pruning of DNNs using Alternating Direction Method of Multipliers
We present a systematic weight pruning framework of deep neural networks
(DNNs) using the alternating direction method of multipliers (ADMM). We first
formulate the weight pruning problem of DNNs as a constrained nonconvex
optimization problem, and then adopt the ADMM framework for systematic weight
pruning. We show that ADMM is highly suitable for weight pruning due to the
computational efficiency it offers. We achieve a much higher compression ratio
compared with prior work while maintaining the same test accuracy, together
with a faster convergence rate. Our models are released at
https://github.com/KaiqiZhang/admm-prunin
Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
Weight pruning and weight quantization are two important categories of DNN
model compression. Prior work on these techniques are mainly based on
heuristics. A recent work developed a systematic frame-work of DNN weight
pruning using the advanced optimization technique ADMM (Alternating Direction
Methods of Multipliers), achieving one of state-of-art in weight pruning
results. In this work, we first extend such one-shot ADMM-based framework to
guarantee solution feasibility and provide fast convergence rate, and
generalize to weight quantization as well. We have further developed a
multi-step, progressive DNN weight pruning and quantization framework, with
dual benefits of (i) achieving further weight pruning/quantization thanks to
the special property of ADMM regularization, and (ii) reducing the search space
within each step. Extensive experimental results demonstrate the superior
performance compared with prior work. Some highlights: (i) we achieve 246x,36x,
and 8x weight pruning on LeNet-5, AlexNet, and ResNet-50 models, respectively,
with (almost) zero accuracy loss; (ii) even a significant 61x weight pruning in
AlexNet (ImageNet) results in only minor degradation in actual accuracy
compared with prior work; (iii) we are among the first to derive notable weight
pruning results for ResNet and MobileNet models; (iv) we derive the first
lossless, fully binarized (for all layers) LeNet-5 for MNIST and VGG-16 for
CIFAR-10; and (v) we derive the first fully binarized (for all layers) ResNet
for ImageNet with reasonable accuracy loss
Progressive Weight Pruning of Deep Neural Networks using ADMM
Deep neural networks (DNNs) although achieving human-level performance in
many domains, have very large model size that hinders their broader
applications on edge computing devices. Extensive research work have been
conducted on DNN model compression or pruning. However, most of the previous
work took heuristic approaches. This work proposes a progressive weight pruning
approach based on ADMM (Alternating Direction Method of Multipliers), a
powerful technique to deal with non-convex optimization problems with
potentially combinatorial constraints. Motivated by dynamic programming, the
proposed method reaches extremely high pruning rate by using partial prunings
with moderate pruning rates. Therefore, it resolves the accuracy degradation
and long convergence time problems when pursuing extremely high pruning ratios.
It achieves up to 34 times pruning rate for ImageNet dataset and 167 times
pruning rate for MNIST dataset, significantly higher than those reached by the
literature work. Under the same number of epochs, the proposed method also
achieves faster convergence and higher compression rates. The codes and pruned
DNN models are released in the link bit.ly/2zxdls
Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach
Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is
an increased demand for deploying DNNs on devices that do not have abundant
resources such as memory and computation units. Recently, network compression
through a variety of techniques such as pruning and quantization have been
proposed to reduce the resource requirement. A key parameter that all existing
compression techniques are sensitive to is the compression ratio (e.g., pruning
sparsity, quantization bitwidth) of each layer. Traditional solutions treat the
compression ratios of each layer as hyper-parameters, and tune them using human
heuristic. Recent researchers start using black-box hyper-parameter
optimizations, but they will introduce new hyper-parameters and have efficiency
issue. In this paper, we propose a framework to jointly prune and quantize the
DNNs automatically according to a target model size without using any
hyper-parameters to manually set the compression ratio for each layer. In the
experiments, we show that our framework can compress the weights data of
ResNet-50 to be 836 smaller without accuracy loss on CIFAR-10, and
compress AlexNet to be 205 smaller without accuracy loss on ImageNet
classification
Adversarial Robustness vs Model Compression, or Both?
It is well known that deep neural networks (DNNs) are vulnerable to
adversarial attacks, which are implemented by adding crafted perturbations onto
benign examples. Min-max robust optimization based adversarial training can
provide a notion of security against adversarial attacks. However, adversarial
robustness requires a significantly larger capacity of the network than that
for the natural training with only benign examples. This paper proposes a
framework of concurrent adversarial training and weight pruning that enables
model compression while still preserving the adversarial robustness and
essentially tackles the dilemma of adversarial training. Furthermore, this work
studies two hypotheses about weight pruning in the conventional setting and
finds that weight pruning is essential for reducing the network model size in
the adversarial setting, training a small model from scratch even with
inherited initialization from the large model cannot achieve both adversarial
robustness and high standard accuracy. Code is available at
https://github.com/yeshaokai/Robustness-Aware-Pruning-ADMM.Comment: Accepted by ICCV 201
ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning
The state-of-art DNN structures involve high computation and great demand for
memory storage which pose intensive challenge on DNN framework resources. To
mitigate the challenges, weight pruning techniques has been studied. However,
high accuracy solution for extreme structured pruning that combines different
types of structured sparsity still waiting for unraveling due to the extremely
reduced weights in DNN networks. In this paper, we propose a DNN framework
which combines two different types of structured weight pruning (filter and
column prune) by incorporating alternating direction method of multipliers
(ADMM) algorithm for better prune performance. We are the first to find
non-optimality of ADMM process and unused weights in a structured pruned model,
and further design an optimization framework which contains the first proposed
Network Purification and Unused Path Removal algorithms which are dedicated to
post-processing an structured pruned model after ADMM steps. Some high lights
shows we achieve 232x compression on LeNet-5, 60x compression on ResNet-18
CIFAR-10 and over 5x compression on AlexNet. We share our models at anonymous
link http://bit.ly/2VJ5ktv.Comment: Submitted to ICML worksho
- …