449 research outputs found
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
We propose two efficient approximations to standard convolutional neural
networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks,
the filters are approximated with binary values resulting in 32x memory saving.
In XNOR-Networks, both the filters and the input to convolutional layers are
binary. XNOR-Networks approximate convolutions using primarily binary
operations. This results in 58x faster convolutional operations and 32x memory
savings. XNOR-Nets offer the possibility of running state-of-the-art networks
on CPUs (rather than GPUs) in real-time. Our binary networks are simple,
accurate, efficient, and work on challenging visual tasks. We evaluate our
approach on the ImageNet classification task. The classification accuracy with
a Binary-Weight-Network version of AlexNet is only 2.9% less than the
full-precision AlexNet (in top-1 measure). We compare our method with recent
network binarization methods, BinaryConnect and BinaryNets, and outperform
these methods by large margins on ImageNet, more than 16% in top-1 accuracy
Performance Guaranteed Network Acceleration via High-Order Residual Quantization
Input binarization has shown to be an effective way for network acceleration.
However, previous binarization scheme could be regarded as simple pixel-wise
thresholding operations (i.e., order-one approximation) and suffers a big
accuracy loss. In this paper, we propose a highorder binarization scheme, which
achieves more accurate approximation while still possesses the advantage of
binary operation. In particular, the proposed scheme recursively performs
residual quantization and yields a series of binary input images with
decreasing magnitude scales. Accordingly, we propose high-order binary
filtering and gradient propagation operations for both forward and backward
computations. Theoretical analysis shows approximation error guarantee property
of proposed method. Extensive experimental results demonstrate that the
proposed scheme yields great recognition accuracy while being accelerated.Comment: 9 pages, 8 figures, Proceeding of IEEE International Conference on
Computer Vision 201
Improved training of binary networks for human pose estimation and image recognition
Big neural networks trained on large datasets have advanced the
state-of-the-art for a large variety of challenging problems, improving
performance by a large margin. However, under low memory and limited
computational power constraints, the accuracy on the same problems drops
considerable. In this paper, we propose a series of techniques that
significantly improve the accuracy of binarized neural networks (i.e networks
where both the features and the weights are binary). We evaluate the proposed
improvements on two diverse tasks: fine-grained recognition (human pose
estimation) and large-scale image recognition (ImageNet classification).
Specifically, we introduce a series of novel methodological changes including:
(a) more appropriate activation functions, (b) reverse-order initialization,
(c) progressive quantization, and (d) network stacking and show that these
additions improve existing state-of-the-art network binarization techniques,
significantly. Additionally, for the first time, we also investigate the extent
to which network binarization and knowledge distillation can be combined. When
tested on the challenging MPII dataset, our method shows a performance
improvement of more than 4% in absolute terms. Finally, we further validate our
findings by applying the proposed techniques for large-scale object recognition
on the Imagenet dataset, on which we report a reduction of error rate by 4%
Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy
Deep convolution neural network has achieved great success in many artificial
intelligence applications. However, its enormous model size and massive
computation cost have become the main obstacle for deployment of such powerful
algorithm in the low power and resource-limited embedded systems. As the
countermeasure to this problem, in this work, we propose statistical weight
scaling and residual expansion methods to reduce the bit-width of the whole
network weight parameters to ternary values (i.e. -1, 0, +1), with the
objectives to greatly reduce model size, computation cost and accuracy
degradation caused by the model compression. With about 16x model compression
rate, our ternarized ResNet-32/44/56 could outperform full-precision
counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our
ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both
achieve the best top-1 accuracy compared to recent similar works, with the same
16x compression rate. If further incorporating our residual expansion method,
compared to the full-precision counterpart, our ternarized ResNet-18 even
improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy
only by 0.42% for the ImageNet dataset, with 8x model compression rate. It
outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5
accuracy, with around 1.25x higher compression rate and more than 6x
computation reduction due to the weight sparsity
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Although weight and activation quantization is an effective approach for Deep
Neural Network (DNN) compression and has a lot of potentials to increase
inference speed leveraging bit-operations, there is still a noticeable gap in
terms of prediction accuracy between the quantized model and the full-precision
model. To address this gap, we propose to jointly train a quantized,
bit-operation-compatible DNN and its associated quantizers, as opposed to using
fixed, handcrafted quantization schemes such as uniform or logarithmic
quantization. Our method for learning the quantizers applies to both network
weights and activations with arbitrary-bit precision, and our quantizers are
easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets
show that our method works consistently well for various network structures
such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous
quantization methods in terms of accuracy by an appreciable margin. Code
available at https://github.com/Microsoft/LQ-NetsComment: ECCV'18 (European Conference on Computer Vision); Main paper + suppl.
materia
Exploring the Connection Between Binary and Spiking Neural Networks
On-chip edge intelligence has necessitated the exploration of algorithmic
techniques to reduce the compute requirements of current machine learning
frameworks. This work aims to bridge the recent algorithmic progress in
training Binary Neural Networks and Spiking Neural Networks - both of which are
driven by the same motivation and yet synergies between the two have not been
fully explored. We show that training Spiking Neural Networks in the extreme
quantization regime results in near full precision accuracies on large-scale
datasets like CIFAR- and ImageNet. An important implication of this work
is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware
accelerators catered for Binary Neural Networks without suffering any accuracy
degradation due to binarization. We utilize standard training techniques for
non-spiking networks to generate our spiking networks by conversion process and
also perform an extensive empirical analysis and explore simple design-time and
run-time optimization techniques for reducing inference latency of spiking
networks (both for binary and full-precision models) by an order of magnitude
over prior work
A Survey on Methods and Theories of Quantized Neural Networks
Deep neural networks are the state-of-the-art methods for many real-world
tasks, such as computer vision, natural language processing and speech
recognition. For all its popularity, deep neural networks are also criticized
for consuming a lot of memory and draining battery life of devices during
training and inference. This makes it hard to deploy these models on mobile or
embedded devices which have tight resource constraints. Quantization is
recognized as one of the most effective approaches to satisfy the extreme
memory requirements that deep neural network models demand. Instead of adopting
32-bit floating point format to represent weights, quantized representations
store weights using more compact formats such as integers or even binary
numbers. Despite a possible degradation in predictive performance, quantization
provides a potential solution to greatly reduce the model size and the energy
consumption. In this survey, we give a thorough review of different aspects of
quantized neural networks. Current challenges and trends of quantized neural
networks are also discussed.Comment: 17 pages, 8 figure
Binary Neural Networks: A Survey
The binary neural network, largely saving the storage and computation, serves
as a promising technique for deploying deep models on resource-limited devices.
However, the binarization inevitably causes severe information loss, and even
worse, its discontinuity brings difficulty to the optimization of the deep
network. To address these issues, a variety of algorithms have been proposed,
and achieved satisfying progress in recent years. In this paper, we present a
comprehensive survey of these algorithms, mainly categorized into the native
solutions directly conducting binarization, and the optimized ones using
techniques like minimizing the quantization error, improving the network loss
function, and reducing the gradient error. We also investigate other practical
aspects of binary neural networks such as the hardware-friendly design and the
training tricks. Then, we give the evaluation and discussions on different
tasks, including image classification, object detection and semantic
segmentation. Finally, the challenges that may be faced in future research are
prospected
Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
This paper presents a novel end-to-end methodology for enabling the
deployment of low-error deep networks on microcontrollers. To fit the memory
and computational limitations of resource-constrained edge-devices, we exploit
mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization,
and we model the inference graph with integer-only operations. Our approach
aims at determining the minimum bit precision of every activation and weight
tensor given the memory constraints of a device. This is achieved through a
rule-based iterative procedure, which cuts the number of bits of the most
memory-demanding layers, aiming at meeting the memory constraints. After a
quantization-aware retraining step, the fake-quantized graph is converted into
an inference integer-only model by inserting the Integer Channel-Normalization
(ICN) layers, which introduce a negligible loss as demonstrated on INT4
MobilenetV1 models. We report the latency-accuracy evaluation of
mixed-precision MobilenetV1 family networks on a STM32H7 microcontroller. Our
experimental results demonstrate an end-to-end deployment of an integer-only
Mobilenet network with Top1 accuracy of 68% on a device with only 2MB of FLASH
memory and 512kB of RAM, improving by 8% the Top1 accuracy with respect to
previously published 8 bit implementations for microcontrollers.Comment: Submitted to NeurIPS 201
NASB: Neural Architecture Search for Binary Convolutional Neural Networks
Binary Convolutional Neural Networks (CNNs) have significantly reduced the
number of arithmetic operations and the size of memory storage needed for CNNs,
which makes their deployment on mobile and embedded systems more feasible.
However, the CNN architecture after binarizing requires to be redesigned and
refined significantly due to two reasons: 1. the large accumulation error of
binarization in the forward propagation, and 2. the severe gradient mismatch
problem of binarization in the backward propagation. Even though the
substantial effort has been invested in designing architectures for single and
multiple binary CNNs, it is still difficult to find an optimal architecture for
binary CNNs. In this paper, we propose a strategy, named NASB, which adopts
Neural Architecture Search (NAS) to find an optimal architecture for the
binarization of CNNs. Due to the flexibility of this automated strategy, the
obtained architecture is not only suitable for binarization but also has low
overhead, achieving a better trade-off between the accuracy and computational
complexity of hand-optimized binary CNNs. The implementation of NASB strategy
is evaluated on the ImageNet dataset and demonstrated as a better solution
compared to existing quantized CNNs. With the insignificant overhead increase,
NASB outperforms existing single and multiple binary CNNs by up to 4.0% and
1.0% Top-1 accuracy respectively, bringing them closer to the precision of
their full precision counterpart. The code and pretrained models will be
publicly available
- …