2,113 research outputs found
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
One of the most significant bottleneck in training large scale machine
learning models on parameter server (PS) is the communication overhead, because
it needs to frequently exchange the model gradients between the workers and
servers during the training iterations. Gradient quantization has been proposed
as an effective approach to reducing the communication volume. One key issue in
gradient quantization is setting the number of bits for quantizing the
gradients. Small number of bits can significantly reduce the communication
overhead while hurts the gradient accuracies, and vise versa. An ideal
quantization method would dynamically balance the communication overhead and
model accuracy, through adjusting the number bits according to the knowledge
learned from the immediate past training iterations. Existing methods, however,
quantize the gradients either with fixed number of bits, or with predefined
heuristic rules. In this paper we propose a novel adaptive quantization method
within the framework of reinforcement learning. The method, referred to as
MQGrad, formalizes the selection of quantization bits as actions in a Markov
decision process (MDP) where the MDP states records the information collected
from the past optimization iterations (e.g., the sequence of the loss function
values). During the training iterations of a machine learning algorithm, MQGrad
continuously updates the MDP state according to the changes of the loss
function. Based on the information, MDP learns to select the optimal actions
(number of bits) to quantize the gradients. Experimental results based on a
benchmark dataset showed that MQGrad can accelerate the learning of a large
scale deep neural network while keeping its prediction accuracies.Comment: 7 pages, 5 figure
On Periodic Functions as Regularizers for Quantization of Neural Networks
Deep learning models have been successfully used in computer vision and many
other fields. We propose an unorthodox algorithm for performing quantization of
the model parameters. In contrast with popular quantization schemes based on
thresholds, we use a novel technique based on periodic functions, such as
continuous trigonometric sine or cosine as well as non-continuous hat
functions. We apply these functions component-wise and add the sum over the
model parameters as a regularizer to the model loss during training. The
frequency and amplitude hyper-parameters of these functions can be adjusted
during training. The regularization pushes the weights into discrete points
that can be encoded as integers. We show that using this technique the
resulting quantized models exhibit the same accuracy as the original ones on
CIFAR-10 and ImageNet datasets.Comment: 11 pages, 7 figure
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
BitNet: Bit-Regularized Deep Neural Networks
We present a novel optimization strategy for training neural networks which
we call "BitNet". The parameters of neural networks are usually unconstrained
and have a dynamic range dispersed over all real values. Our key idea is to
limit the expressive power of the network by dynamically controlling the range
and set of values that the parameters can take. We formulate this idea using a
novel end-to-end approach that circumvents the discrete parameter space by
optimizing a relaxed continuous and differentiable upper bound of the typical
classification loss function. The approach can be interpreted as a
regularization inspired by the Minimum Description Length (MDL) principle. For
each layer of the network, our approach optimizes real-valued translation and
scaling factors and arbitrary precision integer-valued parameters (weights). We
empirically compare BitNet to an equivalent unregularized model on the MNIST
and CIFAR-10 datasets. We show that BitNet converges faster to a superior
quality solution. Additionally, the resulting model has significant savings in
memory due to the use of integer-valued parameters
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
Deep Learning (DL) has had an immense success in the recent past, leading to
state-of-the-art results in various domains such as image recognition and
natural language processing. One of the reasons for this success is the
increasing size of DL models and the proliferation of vast amounts of training
data being available. To keep on improving the performance of DL, increasing
the scalability of DL systems is necessary. In this survey, we perform a broad
and thorough investigation on challenges, techniques and tools for scalable DL
on distributed infrastructures. This incorporates infrastructures for DL,
methods for parallel DL training, multi-tenant resource scheduling and the
management of training and model data. Further, we analyze and compare 11
current open-source DL frameworks and tools and investigate which of the
techniques are commonly implemented in practice. Finally, we highlight future
research trends in DL systems that deserve further research.Comment: accepted at ACM Computing Surveys, to appea
A Survey on Methods and Theories of Quantized Neural Networks
Deep neural networks are the state-of-the-art methods for many real-world
tasks, such as computer vision, natural language processing and speech
recognition. For all its popularity, deep neural networks are also criticized
for consuming a lot of memory and draining battery life of devices during
training and inference. This makes it hard to deploy these models on mobile or
embedded devices which have tight resource constraints. Quantization is
recognized as one of the most effective approaches to satisfy the extreme
memory requirements that deep neural network models demand. Instead of adopting
32-bit floating point format to represent weights, quantized representations
store weights using more compact formats such as integers or even binary
numbers. Despite a possible degradation in predictive performance, quantization
provides a potential solution to greatly reduce the model size and the energy
consumption. In this survey, we give a thorough review of different aspects of
quantized neural networks. Current challenges and trends of quantized neural
networks are also discussed.Comment: 17 pages, 8 figure
L1-Norm Batch Normalization for Efficient Training of Deep Neural Networks
Batch Normalization (BN) has been proven to be quite effective at
accelerating and improving the training of deep neural networks (DNNs).
However, BN brings additional computation, consumes more memory and generally
slows down the training process by a large margin, which aggravates the
training effort. Furthermore, the nonlinear square and root operations in BN
also impede the low bit-width quantization techniques, which draws much
attention in deep learning hardware community. In this work, we propose an
L1-norm BN (L1BN) with only linear operations in both the forward and the
backward propagations during training. L1BN is shown to be approximately
equivalent to the original L2-norm BN (L2BN) by multiplying a scaling factor.
Experiments on various convolutional neural networks (CNNs) and generative
adversarial networks (GANs) reveal that L1BN maintains almost the same
accuracies and convergence rates compared to L2BN but with higher computational
efficiency. On FPGA platform, the proposed signum and absolute operations in
L1BN can achieve 1.5 speedup and save 50\% power consumption, compared
with the original costly square and root operations, respectively. This
hardware-friendly normalization method not only surpasses L2BN in speed, but
also simplify the hardware design of ASIC accelerators with higher energy
efficiency. Last but not the least, L1BN promises a fully quantized training of
DNNs, which is crucial to future adaptive terminal devices.Comment: 8 pages, 4 figure
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
Learning based Facial Image Compression with Semantic Fidelity Metric
Surveillance and security scenarios usually require high efficient facial
image compression scheme for face recognition and identification. While either
traditional general image codecs or special facial image compression schemes
only heuristically refine codec separately according to face verification
accuracy metric. We propose a Learning based Facial Image Compression (LFIC)
framework with a novel Regionally Adaptive Pooling (RAP) module whose
parameters can be automatically optimized according to gradient feedback from
an integrated hybrid semantic fidelity metric, including a successfully
exploration to apply Generative Adversarial Network (GAN) as metric directly in
image compression scheme. The experimental results verify the framework's
efficiency by demonstrating performance improvement of 71.41%, 48.28% and
52.67% bitrate saving separately over JPEG2000, WebP and neural network-based
codecs under the same face verification accuracy distortion metric. We also
evaluate LFIC's superior performance gain compared with latest specific facial
image codecs. Visual experiments also show some interesting insight on how LFIC
can automatically capture the information in critical areas based on semantic
distortion metrics for optimized compression, which is quite different from the
heuristic way of optimization in traditional image compression algorithms.Comment: Accepted by Neurocomputin
- …