3,489 research outputs found
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Although weight and activation quantization is an effective approach for Deep
Neural Network (DNN) compression and has a lot of potentials to increase
inference speed leveraging bit-operations, there is still a noticeable gap in
terms of prediction accuracy between the quantized model and the full-precision
model. To address this gap, we propose to jointly train a quantized,
bit-operation-compatible DNN and its associated quantizers, as opposed to using
fixed, handcrafted quantization schemes such as uniform or logarithmic
quantization. Our method for learning the quantizers applies to both network
weights and activations with arbitrary-bit precision, and our quantizers are
easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets
show that our method works consistently well for various network structures
such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous
quantization methods in terms of accuracy by an appreciable margin. Code
available at https://github.com/Microsoft/LQ-NetsComment: ECCV'18 (European Conference on Computer Vision); Main paper + suppl.
materia
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Deep learning algorithms achieve high classification accuracy at the expense
of significant computation cost. To address this cost, a number of quantization
schemes have been proposed - but most of these techniques focused on quantizing
weights, which are relatively smaller in size compared to activations. This
paper proposes a novel quantization scheme for activations during training -
that enables neural networks to work well with ultra low precision weights and
activations without any significant accuracy degradation. This technique,
PArameterized Clipping acTivation (PACT), uses an activation clipping parameter
that is optimized during training to find the right quantization
scale. PACT allows quantizing activations to arbitrary bit precisions, while
achieving much better accuracy relative to published state-of-the-art
quantization schemes. We show, for the first time, that both weights and
activations can be quantized to 4-bits of precision while still achieving
accuracy comparable to full precision networks across a range of popular models
and datasets. We also show that exploiting these reduced-precision
computational units in hardware can enable a super-linear improvement in
inferencing performance due to a significant reduction in the area of
accelerator compute engines coupled with the ability to retain the quantized
model and activation data in on-chip memories
A Survey on Methods and Theories of Quantized Neural Networks
Deep neural networks are the state-of-the-art methods for many real-world
tasks, such as computer vision, natural language processing and speech
recognition. For all its popularity, deep neural networks are also criticized
for consuming a lot of memory and draining battery life of devices during
training and inference. This makes it hard to deploy these models on mobile or
embedded devices which have tight resource constraints. Quantization is
recognized as one of the most effective approaches to satisfy the extreme
memory requirements that deep neural network models demand. Instead of adopting
32-bit floating point format to represent weights, quantized representations
store weights using more compact formats such as integers or even binary
numbers. Despite a possible degradation in predictive performance, quantization
provides a potential solution to greatly reduce the model size and the energy
consumption. In this survey, we give a thorough review of different aspects of
quantized neural networks. Current challenges and trends of quantized neural
networks are also discussed.Comment: 17 pages, 8 figure
Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)
Deep learning algorithms achieve high classification accuracy at the expense
of significant computation cost. In order to reduce this cost, several
quantization schemes have gained attention recently with some focusing on
weight quantization, and others focusing on quantizing activations. This paper
proposes novel techniques that target weight and activation quantizations
separately resulting in an overall quantized neural network (QNN). The
activation quantization technique, PArameterized Clipping acTivation (PACT),
uses an activation clipping parameter that is optimized during
training to find the right quantization scale. The weight quantization scheme,
statistics-aware weight binning (SAWB), finds the optimal scaling factor that
minimizes the quantization error based on the statistical characteristics of
the distribution of weights without the need for an exhaustive search. The
combination of PACT and SAWB results in a 2-bit QNN that achieves
state-of-the-art classification accuracy (comparable to full precision
networks) across a range of popular models and datasets.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0608
Streamlined Deployment for Quantized Neural Networks
Running Deep Neural Network (DNN) models on devices with limited
computational capability is a challenge due to large compute and memory
requirements. Quantized Neural Networks (QNNs) have emerged as a potential
solution to this problem, promising to offer most of the DNN accuracy benefits
with much lower computational cost. However, harvesting these benefits on
existing mobile CPUs is a challenge since operations on highly quantized
datatypes are not natively supported in most instruction set architectures
(ISAs). In this work, we first describe a streamlining flow to convert all QNN
inference operations to integer ones. Afterwards, we provide techniques based
on processing one bit position at a time (bit-serial) to show how QNNs can be
efficiently deployed using common bitwise operations. We demonstrate the
potential of QNNs on mobile CPUs with microbenchmarks and on a quantized
AlexNet, which is 3.5x faster than an optimized 8-bit baseline. Our bit-serial
matrix multiplication library is available on GitHub at https://git.io/vhshnComment: Presented at the International Workshop on Highly Efficient Neural
Networks Design (HENND) co-located with CASES'1
LEMO: Learn to Equalize for MIMO-OFDM Systems with Low-Resolution ADCs
This paper develops a new deep neural network optimized equalization
framework for massive multiple input multiple output orthogonal frequency
division multiplexing (MIMOOFDM) systems that employ low-resolution
analog-to-digital converters (ADCs) at the base station (BS). The use of
lowresolution ADCs could largely reduce hardware complexity and circuit power
consumption, however, it makes the channel station information almost blind to
the BS, hence causing difficulty in solving the equalization problem. In this
paper, we consider a supervised learning architecture, where the goal is to
learn a representative function that can predict the targets (constellation
points) from the inputs (outputs of the low-resolution ADCs) based on the
labeled training data (pilot signals). Especially, our main contributions are
two-fold: 1) First, we design a new activation function, whose outputs are
close to the constellation points when the parameters are finally optimized, to
help us fully exploit the stochastic gradient descent method for the discrete
optimization problem. 2) Second, an unsupervised loss is designed and then
added to the optimization objective, aiming to enhance the representation
ability (so-called generalization). Lastly, various experimental results
confirm the superiority of the proposed equalizer over some existing ones,
particularly when the statistics of the channel state information are unclear
Deep Neural Network-Based Quantized Signal Reconstruction for DOA Estimation
For a massive multiple-input-multiple-output (MIMO) system using intelligent
reflecting surface (IRS) equipped with radio frequency (RF) chains, the
multi-channel RF chains are expensive compared to passive IRS, especially, when
the high-resolution and high-speed analog to digital converters (ADC) are used
in each RF channel. In this letter, a direction of angle (DOA) estimation
problem is investigated with low-cost ADC in IRS, and we propose a deep neural
network (DNN) as a recovery method for the low-resolution sampled signal.
Different from the existing denoising convolutional neural network (DnCNN) for
Gaussian noise, the proposed DNN with fully connected (FC) layers estimates the
quantization noise caused by the ADC. Then, the denoised signal is subjected to
the DOA estimation, and the recovery performance for the quantized signal is
evaluated by DOA estimation. Simulation results show that under the same
training conditions, the better reconstruction performance is achieved by the
proposed network than state-of-the-art methods. The performance of the DOA
estimation using 1-bit ADC is improved to exceed that using 2-bit ADC
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
Convolutional Neural Networks have rapidly become the most successful machine
learning algorithm, enabling ubiquitous machine vision and intelligent
decisions on even embedded computing-systems. While the underlying arithmetic
is structurally simple, compute and memory requirements are challenging. One of
the promising opportunities is leveraging reduced-precision representations for
inputs, activations and model parameters. The resulting scalability in
performance, power efficiency and storage footprint provides interesting design
compromises in exchange for a small reduction in accuracy. FPGAs are ideal for
exploiting low-precision inference engines leveraging custom precisions to
achieve the required numerical accuracy for a given application. In this
article, we describe the second generation of the FINN framework, an end-to-end
tool which enables design space exploration and automates the creation of fully
customized inference engines on FPGAs. Given a neural network description, the
tool optimizes for given platforms, design targets and a specific precision. We
introduce formalizations of resource cost functions and performance
predictions, and elaborate on the optimization algorithms. Finally, we evaluate
a selection of reduced precision neural networks ranging from CIFAR-10
classifiers to YOLO-based object detection on a range of platforms including
PYNQ and AWS\,F1, demonstrating new unprecedented measured throughput at
50TOp/s on AWS-F1 and 5TOp/s on embedded devices.Comment: to be published in ACM TRETS Special Edition on Deep Learnin
BitSplit-Net: Multi-bit Deep Neural Network with Bitwise Activation Function
Significant computational cost and memory requirements for deep neural
networks (DNNs) make it difficult to utilize DNNs in resource-constrained
environments. Binary neural network (BNN), which uses binary weights and binary
activations, has been gaining interests for its hardware-friendly
characteristics and minimal resource requirement. However, BNN usually suffers
from accuracy degradation. In this paper, we introduce "BitSplit-Net", a neural
network which maintains the hardware-friendly characteristics of BNN while
improving accuracy by using multi-bit precision. In BitSplit-Net, each bit of
multi-bit activations propagates independently throughout the network before
being merged at the end of the network. Thus, each bit path of the BitSplit-Net
resembles BNN and hardware friendly features of BNN, such as bitwise binary
activation function, are preserved in our scheme. We demonstrate that the
BitSplit version of LeNet-5, VGG-9, AlexNet, and ResNet-18 can be trained to
have similar classification accuracy at a lower computational cost compared to
conventional multi-bit networks with low bit precision (<= 4-bit). We further
evaluate BitSplit-Net on GPU with custom CUDA kernel, showing that BitSplit-Net
can achieve better hardware performance in comparison to conventional multi-bit
networks
High performance ultra-low-precision convolutions on mobile devices
Many applications of mobile deep learning, especially real-time computer
vision workloads, are constrained by computation power. This is particularly
true for workloads running on older consumer phones, where a typical device
might be powered by a single- or dual-core ARMv7 CPU. We provide an open-source
implementation and a comprehensive analysis of (to our knowledge) the state of
the art ultra-low-precision (<4 bit precision) implementation of the core
primitives required for modern deep learning workloads on ARMv7 devices, and
demonstrate speedups of 4x-20x over our additional state-of-the-art float32 and
int8 baselines.Comment: Presented at NIPS 2017, Machine Learning on the Phone and other
Consumer Devices worksho
- …