1,565 research outputs found
Loss-aware Weight Quantization of Deep Networks
The huge size of deep networks hinders their use in small computing devices.
In this paper, we consider compressing the network by weight quantization. We
extend a recently proposed loss-aware weight binarization scheme to
ternarization, with possibly different scaling parameters for the positive and
negative weights, and m-bit (where m > 2) quantization. Experiments on
feedforward and recurrent neural networks show that the proposed scheme
outperforms state-of-the-art weight quantization algorithms, and is as accurate
(or even more accurate) than the full-precision network
Improved training of binary networks for human pose estimation and image recognition
Big neural networks trained on large datasets have advanced the
state-of-the-art for a large variety of challenging problems, improving
performance by a large margin. However, under low memory and limited
computational power constraints, the accuracy on the same problems drops
considerable. In this paper, we propose a series of techniques that
significantly improve the accuracy of binarized neural networks (i.e networks
where both the features and the weights are binary). We evaluate the proposed
improvements on two diverse tasks: fine-grained recognition (human pose
estimation) and large-scale image recognition (ImageNet classification).
Specifically, we introduce a series of novel methodological changes including:
(a) more appropriate activation functions, (b) reverse-order initialization,
(c) progressive quantization, and (d) network stacking and show that these
additions improve existing state-of-the-art network binarization techniques,
significantly. Additionally, for the first time, we also investigate the extent
to which network binarization and knowledge distillation can be combined. When
tested on the challenging MPII dataset, our method shows a performance
improvement of more than 4% in absolute terms. Finally, we further validate our
findings by applying the proposed techniques for large-scale object recognition
on the Imagenet dataset, on which we report a reduction of error rate by 4%
Matrix and tensor decompositions for training binary neural networks
This paper is on improving the training of binary neural networks in which
both activations and weights are binary. While prior methods for neural network
binarization binarize each filter independently, we propose to instead
parametrize the weight tensor of each layer using matrix or tensor
decomposition. The binarization process is then performed using this latent
parametrization, via a quantization function (e.g. sign function) applied to
the reconstructed weights. A key feature of our method is that while the
reconstruction is binarized, the computation in the latent factorized space is
done in the real domain. This has several advantages: (i) the latent
factorization enforces a coupling of the filters before binarization, which
significantly improves the accuracy of the trained models. (ii) while at
training time, the binary weights of each convolutional layer are parametrized
using real-valued matrix or tensor decomposition, during inference we simply
use the reconstructed (binary) weights. As a result, our method does not
sacrifice any advantage of binary networks in terms of model compression and
speeding-up inference. As a further contribution, instead of computing the
binary weight scaling factors analytically, as in prior work, we propose to
learn them discriminatively via back-propagation. Finally, we show that our
approach significantly outperforms existing methods when tested on the
challenging tasks of (a) human pose estimation (more than 4% improvements) and
(b) ImageNet classification (up to 5% performance gains)
Learning Recurrent Binary/Ternary Weights
Recurrent neural networks (RNNs) have shown excellent performance in
processing sequence data. However, they are both complex and memory intensive
due to their recursive nature. These limitations make RNNs difficult to embed
on mobile devices requiring real-time processes with limited hardware
resources. To address the above issues, we introduce a method that can learn
binary and ternary weights during the training phase to facilitate hardware
implementations of RNNs. As a result, using this approach replaces all
multiply-accumulate operations by simple accumulations, bringing significant
benefits to custom hardware in terms of silicon area and power consumption. On
the software side, we evaluate the performance (in terms of accuracy) of our
method using long short-term memories (LSTMs) on various sequential models
including sequence classification and language modeling. We demonstrate that
our method achieves competitive results on the aforementioned tasks while using
binary/ternary weights during the runtime. On the hardware side, we present
custom hardware for accelerating the recurrent computations of LSTMs with
binary/ternary weights. Ultimately, we show that LSTMs with binary/ternary
weights can achieve up to 12x memory saving and 10x inference speedup compared
to the full-precision implementation on an ASIC platform.Comment: Published as a conference paper at ICLR 201
A Survey on Methods and Theories of Quantized Neural Networks
Deep neural networks are the state-of-the-art methods for many real-world
tasks, such as computer vision, natural language processing and speech
recognition. For all its popularity, deep neural networks are also criticized
for consuming a lot of memory and draining battery life of devices during
training and inference. This makes it hard to deploy these models on mobile or
embedded devices which have tight resource constraints. Quantization is
recognized as one of the most effective approaches to satisfy the extreme
memory requirements that deep neural network models demand. Instead of adopting
32-bit floating point format to represent weights, quantized representations
store weights using more compact formats such as integers or even binary
numbers. Despite a possible degradation in predictive performance, quantization
provides a potential solution to greatly reduce the model size and the energy
consumption. In this survey, we give a thorough review of different aspects of
quantized neural networks. Current challenges and trends of quantized neural
networks are also discussed.Comment: 17 pages, 8 figure
High-Accuracy Inference in Neuromorphic Circuits using Hardware-Aware Training
Neuromorphic Multiply-And-Accumulate (MAC) circuits utilizing synaptic weight
elements based on SRAM or novel Non-Volatile Memories (NVMs) provide a
promising approach for highly efficient hardware representations of neural
networks. NVM density and robustness requirements suggest that off-line
training is the right choice for "edge" devices, since the requirements for
synapse precision are much less stringent. However, off-line training using
ideal mathematical weights and activations can result in significant loss of
inference accuracy when applied to non-ideal hardware. Non-idealities such as
multi-bit quantization of weights and activations, non-linearity of weights,
finite max/min ratios of NVM elements, and asymmetry of positive and negative
weight components all result in degraded inference accuracy. In this work, it
is demonstrated that non-ideal Multi-Layer Perceptron (MLP) architectures using
low bitwidth weights and activations can be trained with negligible loss of
inference accuracy relative to their Floating Point-trained counterparts using
a proposed off-line, continuously differentiable HW-aware training algorithm.
The proposed algorithm is applicable to a wide range of hardware models, and
uses only standard neural network training methods. The algorithm is
demonstrated on the MNIST and EMNIST datasets, using standard MLPs.Comment: 12 pages, 18 figure
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
It is well known that many types of artificial neural networks, including
recurrent networks, can achieve a high classification accuracy even with
low-precision weights and activations. The reduction in precision generally
yields much more efficient hardware implementations in regards to hardware
cost, memory requirements, energy, and achievable throughput. In this paper, we
present the first systematic exploration of this design space as a function of
precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network.
Specifically, we include an in-depth investigation of precision vs. accuracy
using a fully hardware-aware training flow, where during training quantization
of all aspects of the network including weights, input, output and in-memory
cell activations are taken into consideration. In addition, hardware resource
cost, power consumption and throughput scalability are explored as a function
of precision for FPGA-based implementations of BiLSTM, and multiple approaches
of parallelizing the hardware. We provide the first open source HLS library
extension of FINN for parameterizable hardware architectures of LSTM layers on
FPGAs which offers full precision flexibility and allows for parameterizable
performance scaling offering different levels of parallelism within the
architecture. Based on this library, we present an FPGA-based accelerator for
BiLSTM neural network designed for optical character recognition, along with
numerous other experimental proof points for a Zynq UltraScale+ XCZU7EV MPSoC
within the given design space.Comment: Accepted for publication, 28th International Conference on Field
Programmable Logic and Applications (FPL), August, 2018, Dublin, Irelan
Deep Local Binary Patterns
Local Binary Pattern (LBP) is a traditional descriptor for texture analysis
that gained attention in the last decade. Being robust to several properties
such as invariance to illumination translation and scaling, LBPs achieved
state-of-the-art results in several applications. However, LBPs are not able to
capture high-level features from the image, merely encoding features with low
abstraction levels. In this work, we propose Deep LBP, which borrow ideas from
the deep learning community to improve LBP expressiveness. By using
parametrized data-driven LBP, we enable successive applications of the LBP
operators with increasing abstraction levels. We validate the relevance of the
proposed idea in several datasets from a wide range of applications. Deep LBP
improved the performance of traditional and multiscale LBP in all cases
SinReQ: Generalized Sinusoidal Regularization for Low-Bitwidth Deep Quantized Training
Deep quantization of neural networks (below eight bits) offers significant
promise in reducing their compute and storage cost. Albeit alluring, without
special techniques for training and optimization, deep quantization results in
significant accuracy loss. To further mitigate this loss, we propose a novel
sinusoidal regularization, called SinReQ1, for deep quantized training. SinReQ
adds a periodic term to the original objective function of the underlying
training algorithm. SinReQ exploits the periodicity, differentiability, and the
desired convexity profile in sinusoidal functions to automatically propel
weights towards values that are inherently closer to quantization levels.
Since, this technique does not require invasive changes to the training
procedure, SinReQ can harmoniously enhance quantized training algorithms.
SinReQ offers generality and flexibility as it is not limited to a certain
bitwidth or a uniform assignment of bitwidths across layers. We carry out
experimentation using the AlexNet, CIFAR-10, ResNet-18, ResNet-20, SVHN, and
VGG-11 DNNs with three to five bits for quantization and show the versatility
of SinReQ in enhancing multiple quantized training algorithms, DoReFa [32] and
WRPN [24]. Averaging across all the bit configurations shows that SinReQ closes
the accuracy gap between these two techniques and the full-precision runs by
32.4% and 27.5%, respectively. That is improving the absolute accuracy of
DoReFa and WRPN by 2.8% and 2.1%, respectively
- …