23 research outputs found
Distribution-sensitive Information Retention for Accurate Binary Neural Network
Model binarization is an effective method of compressing neural networks and
accelerating their inference process. However, a significant performance gap
still exists between the 1-bit model and the 32-bit one. The empirical study
shows that binarization causes a great loss of information in the forward and
backward propagation. We present a novel Distribution-sensitive Information
Retention Network (DIR-Net) that retains the information in the forward and
backward propagation by improving internal propagation and introducing external
representations. The DIR-Net mainly relies on three technical contributions:
(1) Information Maximized Binarization (IMB): minimizing the information loss
and the binarization error of weights/activations simultaneously by weight
balance and standardization; (2) Distribution-sensitive Two-stage Estimator
(DTE): retaining the information of gradients by distribution-sensitive soft
approximation by jointly considering the updating capability and accurate
gradient; (3) Representation-align Binarization-aware Distillation (RBD):
retaining the representation information by distilling the representations
between full-precision and binarized networks. The DIR-Net investigates both
forward and backward processes of BNNs from the unified information
perspective, thereby providing new insight into the mechanism of network
binarization. The three techniques in our DIR-Net are versatile and effective
and can be applied in various structures to improve BNNs. Comprehensive
experiments on the image classification and objective detection tasks show that
our DIR-Net consistently outperforms the state-of-the-art binarization
approaches under mainstream and compact architectures, such as ResNet, VGG,
EfficientNet, DARTS, and MobileNet. Additionally, we conduct our DIR-Net on
real-world resource-limited devices which achieves 11.1x storage saving and
5.4x speedup
BATS: Binary ArchitecTure Search
This paper proposes Binary ArchitecTure Search (BATS), a framework that
drastically reduces the accuracy gap between binary neural networks and their
real-valued counterparts by means of Neural Architecture Search (NAS). We show
that directly applying NAS to the binary domain provides very poor results. To
alleviate this, we describe, to our knowledge, for the first time, the 3 key
ingredients for successfully applying NAS to the binary domain. Specifically,
we (1) introduce and design a novel binary-oriented search space, (2) propose a
new mechanism for controlling and stabilising the resulting searched
topologies, (3) propose and validate a series of new search strategies for
binary networks that lead to faster convergence and lower search times.
Experimental results demonstrate the effectiveness of the proposed approach and
the necessity of searching in the binary space directly. Moreover, (4) we set a
new state-of-the-art for binary neural networks on CIFAR10, CIFAR100 and
ImageNet datasets. Code will be made available
https://github.com/1adrianb/binary-nasComment: accepted to ECCV 202
AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks
Exploring the expected quantizing scheme with suitable mixed-precision policy
is the key point to compress deep neural networks (DNNs) in high efficiency and
accuracy. This exploration implies heavy workloads for domain experts, and an
automatic compression method is needed. However, the huge search space of the
automatic method introduces plenty of computing budgets that make the automatic
process challenging to be applied in real scenarios. In this paper, we propose
an end-to-end framework named AutoQNN, for automatically quantizing different
layers utilizing different schemes and bitwidths without any human labor.
AutoQNN can seek desirable quantizing schemes and mixed-precision policies for
mainstream DNN models efficiently by involving three techniques: quantizing
scheme search (QSS), quantizing precision learning (QPL), and quantized
architecture generation (QAG). QSS introduces five quantizing schemes and
defines three new schemes as a candidate set for scheme search, and then uses
the differentiable neural architecture search (DNAS) algorithm to seek the
layer- or model-desired scheme from the set. QPL is the first method to learn
mixed-precision policies by reparameterizing the bitwidths of quantizing
schemes, to the best of our knowledge. QPL optimizes both classification loss
and precision loss of DNNs efficiently and obtains the relatively optimal
mixed-precision model within limited model size and memory footprint. QAG is
designed to convert arbitrary architectures into corresponding quantized ones
without manual intervention, to facilitate end-to-end neural network
quantization. We have implemented AutoQNN and integrated it into Keras.
Extensive experiments demonstrate that AutoQNN can consistently outperform
state-of-the-art quantization.Comment: 22 pages, 9 figures, 7 tables, Journal of Computer Science and
Technolog