5,007 research outputs found
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization
Low-bit deep neural networks (DNNs) become critical for embedded applications
due to their low storage requirement and computing efficiency. However, they
suffer much from the non-negligible accuracy drop. This paper proposes the
stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs. The
motivation is due to the following observation. Existing training algorithms
approximate the real-valued elements/filters with low-bit representation all
together in each iteration. The quantization errors may be small for some
elements/filters, while are remarkable for others, which lead to inappropriate
gradient direction during training, and thus bring notable accuracy drop.
Instead, SQ quantizes a portion of elements/filters to low-bit with a
stochastic probability inversely proportional to the quantization error, while
keeping the other portion unchanged with full-precision. The quantized and
full-precision portions are updated with corresponding gradients separately in
each iteration. The SQ ratio is gradually increased until the whole network is
quantized. This procedure can greatly compensate the quantization error and
thus yield better accuracy for low-bit DNNs. Experiments show that SQ can
consistently and significantly improve the accuracy for different low-bit DNNs
on various datasets and various network structures.Comment: BMVC 2017 Ora
Progressive Stochastic Binarization of Deep Networks
A plethora of recent research has focused on improving the memory footprint
and inference speed of deep networks by reducing the complexity of (i)
numerical representations (for example, by deterministic or stochastic
quantization) and (ii) arithmetic operations (for example, by binarization of
weights).
We propose a stochastic binarization scheme for deep networks that allows for
efficient inference on hardware by restricting itself to additions of small
integers and fixed shifts. Unlike previous approaches, the underlying
randomized approximation is progressive, thus permitting an adaptive control of
the accuracy of each operation at run-time. In a low-precision setting, we
match the accuracy of previous binarized approaches. Our representation is
unbiased - it approaches continuous computation with increasing sample size. In
a high-precision regime, the computational costs are competitive with previous
quantization schemes. Progressive stochastic binarization also permits
localized, dynamic accuracy control within a single network, thereby providing
a new tool for adaptively focusing computational attention.
We evaluate our method on networks of various architectures, already
pretrained on ImageNet. With representational costs comparable to previous
schemes, we obtain accuracies close to the original floating point
implementation. This includes pruned networks, except the known special case of
certain types of separated convolutions. By focusing computational attention
using progressive sampling, we reduce inference costs on ImageNet further by a
factor of up to 33% (before network pruning)
Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search
Recent work in network quantization has substantially reduced the time and
space complexity of neural network inference, enabling their deployment on
embedded and mobile devices with limited computational and memory resources.
However, existing quantization methods often represent all weights and
activations with the same precision (bit-width). In this paper, we explore a
new dimension of the design space: quantizing different layers with different
bit-widths. We formulate this problem as a neural architecture search problem
and propose a novel differentiable neural architecture search (DNAS) framework
to efficiently explore its exponential search space with gradient-based
optimization. Experiments show we surpass the state-of-the-art compression of
ResNet on CIFAR-10 and ImageNet. Our quantized models with 21.1x smaller model
size or 103.9x lower computational cost can still outperform baseline quantized
or even full precision models
Scalable Methods for 8-bit Training of Neural Networks
Quantized Neural Networks (QNNs) are often used to improve network efficiency
during the inference phase, i.e. after the network has been trained. Extensive
research in the field suggests many different quantization schemes. Still, the
number of bits required, as well as the best quantization scheme, are yet
unknown. Our theoretical analysis suggests that most of the training process is
robust to substantial precision reduction, and points to only a few specific
operations that require higher precision. Armed with this knowledge, we
quantize the model parameters, activations and layer gradients to 8-bit,
leaving at a higher precision only the final step in the computation of the
weight gradients. Additionally, as QNNs require batch-normalization to be
trained at high precision, we introduce Range Batch-Normalization (BN) which
has significantly higher tolerance to quantization noise and improved
computational complexity. Our simulations show that Range BN is equivalent to
the traditional batch norm if a precise scale adjustment, which can be
approximated analytically, is applied. To the best of the authors' knowledge,
this work is the first to quantize the weights, activations, as well as a
substantial volume of the gradients stream, in all layers (including batch
normalization) to 8-bit while showing state-of-the-art results over the
ImageNet-1K dataset
DFTerNet: Towards 2-bit Dynamic Fusion Networks for Accurate Human Activity Recognition
Deep Convolutional Neural Networks (DCNNs) are currently popular in human
activity recognition applications. However, in the face of modern artificial
intelligence sensor-based games, many research achievements cannot be
practically applied on portable devices. DCNNs are typically resource-intensive
and too large to be deployed on portable devices, thus this limits the
practical application of complex activity detection. In addition, since
portable devices do not possess high-performance Graphic Processing Units
(GPUs), there is hardly any improvement in Action Game (ACT) experience.
Besides, in order to deal with multi-sensor collaboration, all previous human
activity recognition models typically treated the representations from
different sensor signal sources equally. However, distinct types of activities
should adopt different fusion strategies. In this paper, a novel scheme is
proposed. This scheme is used to train 2-bit Convolutional Neural Networks with
weights and activations constrained to {-0.5,0,0.5}. It takes into account the
correlation between different sensor signal sources and the activity types.
This model, which we refer to as DFTerNet, aims at producing a more reliable
inference and better trade-offs for practical applications. Our basic idea is
to exploit quantization of weights and activations directly in pre-trained
filter banks and adopt dynamic fusion strategies for different activity types.
Experiments demonstrate that by using dynamic fusion strategy can exceed the
baseline model performance by up to ~5% on activity recognition like
OPPORTUNITY and PAMAP2 datasets. Using the quantization method proposed, we
were able to achieve performances closer to that of full-precision counterpart.
These results were also verified using the UniMiB-SHAR dataset. In addition,
the proposed method can achieve ~9x acceleration on CPUs and ~11x memory
saving.Comment: 19 pages, 5 figures, 6 tables, accepted by IEEE Acces
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Although weight and activation quantization is an effective approach for Deep
Neural Network (DNN) compression and has a lot of potentials to increase
inference speed leveraging bit-operations, there is still a noticeable gap in
terms of prediction accuracy between the quantized model and the full-precision
model. To address this gap, we propose to jointly train a quantized,
bit-operation-compatible DNN and its associated quantizers, as opposed to using
fixed, handcrafted quantization schemes such as uniform or logarithmic
quantization. Our method for learning the quantizers applies to both network
weights and activations with arbitrary-bit precision, and our quantizers are
easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets
show that our method works consistently well for various network structures
such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous
quantization methods in terms of accuracy by an appreciable margin. Code
available at https://github.com/Microsoft/LQ-NetsComment: ECCV'18 (European Conference on Computer Vision); Main paper + suppl.
materia
Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks
It is known that training deep neural networks, in particular, deep
convolutional networks, with aggressively reduced numerical precision is
challenging. The stochastic gradient descent algorithm becomes unstable in the
presence of noisy gradient updates resulting from arithmetic with limited
numeric precision. One of the well-accepted solutions facilitating the training
of low precision fixed point networks is stochastic rounding. However, to the
best of our knowledge, the source of the instability in training neural
networks with noisy gradient updates has not been well investigated. This work
is an attempt to draw a theoretical connection between low numerical precision
and training algorithm stability. In doing so, we will also propose and verify
through experiments methods that are able to improve the training performance
of deep convolutional networks in fixed point.Comment: ICML2016 - Workshop on On-Device Intelligenc
NICE: Noise Injection and Clamping Estimation for Neural Network Quantization
Convolutional Neural Networks (CNN) are very popular in many fields including
computer vision, speech recognition, natural language processing, to name a
few. Though deep learning leads to groundbreaking performance in these domains,
the networks used are very demanding computationally and are far from real-time
even on a GPU, which is not power efficient and therefore does not suit low
power systems such as mobile devices. To overcome this challenge, some
solutions have been proposed for quantizing the weights and activations of
these networks, which accelerate the runtime significantly. Yet, this
acceleration comes at the cost of a larger error. The \uniqname method proposed
in this work trains quantized neural networks by noise injection and a learned
clamping, which improve the accuracy. This leads to state-of-the-art results on
various regression and classification tasks, e.g., ImageNet classification with
architectures such as ResNet-18/34/50 with low as 3-bit weights and
activations. We implement the proposed solution on an FPGA to demonstrate its
applicability for low power real-time applications. The implementation of the
paper is available at https://github.com/Lancer555/NIC
Learned Step Size Quantization
Deep networks run with low precision operations at inference time offer power
and space advantages over high precision alternatives, but need to overcome the
challenge of maintaining high accuracy as precision decreases. Here, we present
a method for training such networks, Learned Step Size Quantization, that
achieves the highest accuracy to date on the ImageNet dataset when using
models, from a variety of architectures, with weights and activations quantized
to 2-, 3- or 4-bits of precision, and that can train 3-bit models that reach
full precision baseline accuracy. Our approach builds upon existing methods for
learning weights in quantized networks by improving how the quantizer itself is
configured. Specifically, we introduce a novel means to estimate and scale the
task loss gradient at each weight and activation layer's quantizer step size,
such that it can be learned in conjunction with other network parameters. This
approach works using different levels of precision as needed for a given system
and requires only a simple modification of existing training code.Comment: International Conference on Learning Representations (2020
SWALP : Stochastic Weight Averaging in Low-Precision Training
Low precision operations can provide scalability, memory savings,
portability, and energy efficiency. This paper proposes SWALP, an approach to
low precision training that averages low-precision SGD iterates with a modified
learning rate schedule. SWALP is easy to implement and can match the
performance of full-precision SGD even with all numbers quantized down to 8
bits, including the gradient accumulators. Additionally, we show that SWALP
converges arbitrarily close to the optimal solution for quadratic objectives,
and to a noise ball asymptotically smaller than low precision SGD in strongly
convex settings.Comment: Published at ICML 201
- …