532 research outputs found
Data-Free Quantization Through Weight Equalization and Bias Correction
We introduce a data-free quantization method for deep neural networks that
does not require fine-tuning or hyperparameter selection. It achieves
near-original model performance on common computer vision architectures and
tasks. 8-bit fixed-point quantization is essential for efficient inference on
modern deep learning hardware. However, quantizing models to run in 8-bit is a
non-trivial task, frequently leading to either significant performance
reduction or engineering time spent on training a network to be amenable to
quantization. Our approach relies on equalizing the weight ranges in the
network by making use of a scale-equivariance property of activation functions.
In addition the method corrects biases in the error that are introduced during
quantization. This improves quantization accuracy performance, and can be
applied to many common computer vision architectures with a straight forward
API call. For common architectures, such as the MobileNet family, we achieve
state-of-the-art quantized model performance. We further show that the method
also extends to other computer vision architectures and tasks such as semantic
segmentation and object detection.Comment: ICCV 201
Data-Free Network Quantization With Adversarial Knowledge Distillation
Network quantization is an essential procedure in deep learning for
development of efficient fixed-point inference models on mobile or edge
platforms. However, as datasets grow larger and privacy regulations become
stricter, data sharing for model compression gets more difficult and
restricted. In this paper, we consider data-free network quantization with
synthetic data. The synthetic data are generated from a generator, while no
data are used in training the generator and in quantization. To this end, we
propose data-free adversarial knowledge distillation, which minimizes the
maximum distance between the outputs of the teacher and the (quantized) student
for any adversarial samples from a generator. To generate adversarial samples
similar to the original data, we additionally propose matching statistics from
the batch normalization layers for generated data and the original data in the
teacher. Furthermore, we show the gain of producing diverse adversarial samples
by using multiple generators and multiple students. Our experiments show the
state-of-the-art data-free model compression and quantization results for
(wide) residual networks and MobileNet on SVHN, CIFAR-10, CIFAR-100, and
Tiny-ImageNet datasets. The accuracy losses compared to using the original
datasets are shown to be very minimal.Comment: CVPR 2020 Joint Workshop on Efficient Deep Learning in Computer
Vision (EDLCV
Subtensor Quantization for Mobilenets
Quantization for deep neural networks (DNN) have enabled developers to deploy
models with less memory and more efficient low-power inference. However, not
all DNN designs are friendly to quantization. For example, the popular
Mobilenet architecture has been tuned to reduce parameter size and
computational latency with separable depth-wise convolutions, but not all
quantization algorithms work well and the accuracy can suffer against its float
point versions. In this paper, we analyzed several root causes of quantization
loss and proposed alternatives that do not rely on per-channel or
training-aware approaches. We evaluate the image classification task on
ImageNet dataset, and our post-training quantized 8-bit inference top-1
accuracy in within 0.7% of the floating point version.Comment: Embedded Vision Workshop, 16th European Conference on Computer Vision
(ECCV), Aug 202
A Data and Compute Efficient Design for Limited-Resources Deep Learning
Thanks to their improved data efficiency, equivariant neural networks have
gained increased interest in the deep learning community. They have been
successfully applied in the medical domain where symmetries in the data can be
effectively exploited to build more accurate and robust models. To be able to
reach a much larger body of patients, mobile, on-device implementations of deep
learning solutions have been developed for medical applications. However,
equivariant models are commonly implemented using large and computationally
expensive architectures, not suitable to run on mobile devices. In this work,
we design and test an equivariant version of MobileNetV2 and further optimize
it with model quantization to enable more efficient inference. We achieve
close-to state of the art performance on the Patch Camelyon (PCam) medical
dataset while being more computationally efficient.Comment: Accepted for poster presentation at the Practical Machine Learning
for Developing Countries (PML4DC) workshop, ICLR 202
MRQ:Support Multiple Quantization Schemes through Model Re-Quantization
Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU,
DPU), deploying deep learning models on edge devices with fixed-point hardware
is still challenging due to complex model quantization and conversion. Existing
model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and
Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g.,
only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep
learning models cannot be easily quantized for diverse fixed-point hardwares,
mainly due to slightly different quantization requirements. In this paper, we
envision a new type of model quantization approach called MRQ (model
re-quantization), which takes existing quantized models and quickly transforms
the models to meet different quantization requirements (e.g., asymmetric ->
symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much
simpler than quantizing from scratch because it avoids costly re-training and
provides support for multiple quantization schemes simultaneously. To minimize
re-quantization error, we developed a new set of re-quantization algorithms
including weight correction and rounding error folding. We have demonstrated
that MobileNetV2 QAT model [7] can be quickly re-quantized into two different
quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less
than 0.64 units of accuracy loss. We believe our work is the first to leverage
this concept of re-quantization for model quantization and models obtained from
the re-quantization process have been successfully deployed on NNA in the Echo
Show devices.Comment: 8 pages, 6 figures, 3 tables, TinyML Conferenc
Weight Equalizing Shift Scaler-Coupled Post-training Quantization
Post-training, layer-wise quantization is preferable because it is free from
retraining and is hardware-friendly. Nevertheless, accuracy degradation has
occurred when a neural network model has a big difference of per-out-channel
weight ranges. In particular, the MobileNet family has a tragedy drop in top-1
accuracy from 70.60% ~ 71.87% to 0.1% on the ImageNet dataset after 8-bit
weight quantization. To mitigate this significant accuracy reduction, we
propose a new weight equalizing shift scaler, i.e. rescaling the weight range
per channel by a 4-bit binary shift, prior to a layer-wise quantization. To
recover the original output range, inverse binary shifting is efficiently fused
to the existing per-layer scale compounding in the fixed-computing
convolutional operator of the custom neural processing unit. The binary shift
is a key feature of our algorithm, which significantly improved the accuracy
performance without impeding the memory footprint. As a result, our proposed
method achieved a top-1 accuracy of 69.78% ~ 70.96% in MobileNets and showed
robust performance in varying network models and tasks, which is competitive to
channel-wise quantization results.Comment: 9 pages, 4 figures, 4 table
Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis
Quantization is an essential step in the efficient deployment of deep
learning models and as such is an increasingly popular research topic. An
important practical aspect that is not addressed in the current literature is
how to analyze and fix fail cases where the use of quantization results in
excessive degradation. In this paper, we present a simple analytic framework
that breaks down overall degradation to its per layer contributions. We analyze
many common networks and observe that a layer's contribution is determined by
both intrinsic (local) factors - the distribution of the layer's weights and
activations - and extrinsic (global) factors having to do with the the
interaction with the rest of the layers. Layer-wise analysis of existing
quantization schemes reveals local fail-cases of existing techniques which are
not reflected when inspecting their overall performance. As an example, we
consider ResNext26 on which SoTA post-training quantization methods perform
poorly. We show that almost all of the degradation stems from a single layer.
The same analysis also allows for local fixes - applying a common weight
clipping heuristic only to this layer reduces degradation to a minimum while
applying the same heuristic globally results in high degradation. More
generally, layer-wise analysis allows for a more nuanced examination of how
quantization affects the network, enabling the design of better performing
schemes
Quantization of Neural Network Equalizers in Optical Fiber Transmission Experiments
The quantization of neural networks for the mitigation of the nonlinear and
components' distortions in dual-polarization optical fiber transmission is
studied. Two low-complexity neural network equalizers are applied in three
16-QAM 34.4 GBaud transmission experiments with different representative
fibers. A number of post-training quantization and quantization-aware training
algorithms are compared for casting the weights and activations of the neural
network in few bits, combined with the uniform, additive power-of-two, and
companding quantization. For quantization in the large bit-width regime of
bits, the quantization-aware training with the straight-through
estimation incurs a Q-factor penalty of less than 0.5 dB compared to the
unquantized neural network. For quantization in the low bit-width regime, an
algorithm dubbed companding successive alpha-blending quantization is
suggested. This method compensates for the quantization error aggressively by
successive grouping and retraining of the parameters, as well as an incremental
transition from the floating-point representations to the quantized values
within each group. The activations can be quantized at 8 bits and the weights
on average at 1.75 bits, with a penalty of ~dB. If the activations
are quantized at 6 bits, the weights can be quantized at 3.75 bits with minimal
penalty. The computational complexity and required storage of the neural
networks are drastically reduced, typically by over 90\%. The results indicate
that low-complexity neural networks can mitigate nonlinearities in optical
fiber transmission.Comment: 15 pages, 9 figures, 5 table
Bit Efficient Quantization for Deep Neural Networks
Quantization for deep neural networks have afforded models for edge devices
that use less on-board memory and enable efficient low-power inference. In this
paper, we present a comparison of model-parameter driven quantization
approaches that can achieve as low as 3-bit precision without affecting
accuracy. The post-training quantization approaches are data-free, and the
resulting weight values are closely tied to the dataset distribution on which
the model has converged to optimality. We show quantization results for a
number of state-of-art deep neural networks (DNN) using large dataset like
ImageNet. To better analyze quantization results, we describe the overall range
and local sparsity of values afforded through various quantization schemes. We
show the methods to lower bit-precision beyond quantization limits with object
class clustering.Comment: EMC2 - NeurIPS workshop 2019, #latenta
Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence
Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems
- …