2,327 research outputs found
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
Whetstone: A Method for Training Deep Artificial Neural Networks for Binary Communication
This paper presents a new technique for training networks for low-precision
communication. Targeting minimal communication between nodes not only enables
the use of emerging spiking neuromorphic platforms, but may additionally
streamline processing conventionally. Low-power and embedded neuromorphic
processors potentially offer dramatic performance-per-Watt improvements over
traditional von Neumann processors, however programming these brain-inspired
platforms generally requires platform-specific expertise which limits their
applicability. To date, the majority of artificial neural networks have not
operated using discrete spike-like communication.
We present a method for training deep spiking neural networks using an
iterative modification of the backpropagation optimization algorithm. This
method, which we call Whetstone, effectively and reliably configures a network
for a spiking hardware target with little, if any, loss in performance.
Whetstone networks use single time step binary communication and do not require
a rate code or other spike-based coding scheme, thus producing networks
comparable in timing and size to conventional ANNs, albeit with binarized
communication. We demonstrate Whetstone on a number of image classification
networks, describing how the sharpening process interacts with different
training optimizers and changes the distribution of activity within the
network. We further note that Whetstone is compatible with several
non-classification neural network applications, such as autoencoders and
semantic segmentation. Whetstone is widely extendable and currently implemented
using custom activation functions within the Keras wrapper to the popular
TensorFlow machine learning framework
Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach
Deep Neural Networks (DNNs) are applied in a wide range of usecases. There is
an increased demand for deploying DNNs on devices that do not have abundant
resources such as memory and computation units. Recently, network compression
through a variety of techniques such as pruning and quantization have been
proposed to reduce the resource requirement. A key parameter that all existing
compression techniques are sensitive to is the compression ratio (e.g., pruning
sparsity, quantization bitwidth) of each layer. Traditional solutions treat the
compression ratios of each layer as hyper-parameters, and tune them using human
heuristic. Recent researchers start using black-box hyper-parameter
optimizations, but they will introduce new hyper-parameters and have efficiency
issue. In this paper, we propose a framework to jointly prune and quantize the
DNNs automatically according to a target model size without using any
hyper-parameters to manually set the compression ratio for each layer. In the
experiments, we show that our framework can compress the weights data of
ResNet-50 to be 836 smaller without accuracy loss on CIFAR-10, and
compress AlexNet to be 205 smaller without accuracy loss on ImageNet
classification
Measuring the Effects of Data Parallelism on Neural Network Training
Recent hardware developments have dramatically increased the scale of data
parallelism available for neural network training. Among the simplest ways to
harness next-generation hardware is to increase the batch size in standard
mini-batch neural network training algorithms. In this work, we aim to
experimentally characterize the effects of increasing the batch size on
training time, as measured by the number of steps necessary to reach a goal
out-of-sample error. We study how this relationship varies with the training
algorithm, model, and data set, and find extremely large variation between
workloads. Along the way, we show that disagreements in the literature on how
batch size affects model quality can largely be explained by differences in
metaparameter tuning and compute budgets at different batch sizes. We find no
evidence that larger batch sizes degrade out-of-sample performance. Finally, we
discuss the implications of our results on efforts to train neural networks
much faster in the future. Our experimental data is publicly available as a
database of 71,638,836 loss measurements taken over the course of training for
168,160 individual models across 35 workloads
TBD: Benchmarking and Analyzing Deep Neural Network Training
The recent popularity of deep neural networks (DNNs) has generated a lot of
research interest in performing DNN-related computation efficiently. However,
the primary focus is usually very narrow and limited to (i) inference -- i.e.
how to efficiently execute already trained models and (ii) image classification
networks as the primary benchmark for evaluation.
Our primary goal in this work is to break this myopic view by (i) proposing a
new benchmark for DNN training, called TBD (TBD is short for Training Benchmark
for DNNs), that uses a representative set of DNN models that cover a wide range
of machine learning applications: image classification, machine translation,
speech recognition, object detection, adversarial networks, reinforcement
learning, and (ii) by performing an extensive performance analysis of training
these different applications on three major deep learning frameworks
(TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU,
multi-GPU, and multi-machine). TBD currently covers six major application
domains and eight different state-of-the-art models.
We present a new toolchain for performance analysis for these models that
combines the targeted usage of existing performance analysis tools, careful
selection of new and existing metrics and methodologies to analyze the results,
and utilization of domain specific characteristics of DNN training. We also
build a new set of tools for memory profiling in all three major frameworks;
much needed tools that can finally shed some light on precisely how much memory
is consumed by different data structures (weights, activations, gradients,
workspace) in DNN training. By using our tools and methodologies, we make
several important observations and recommendations on where the future research
and optimization of DNN training should be focused
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
The performance of deep neural networks is well-known to be sensitive to the
setting of their hyperparameters. Recent advances in reverse-mode automatic
differentiation allow for optimizing hyperparameters with gradients. The
standard way of computing these gradients involves a forward and backward pass
of computations. However, the backward pass usually needs to consume
unaffordable memory to store all the intermediate variables to exactly reverse
the forward training procedure. In this work we propose a simple but effective
method, DrMAD, to distill the knowledge of the forward pass into a shortcut
path, through which we approximately reverse the training trajectory.
Experiments on several image benchmark datasets show that DrMAD is at least 45
times faster and consumes 100 times less memory compared to state-of-the-art
methods for optimizing hyperparameters with minimal compromise to its
effectiveness. To the best of our knowledge, DrMAD is the first research
attempt to make it practical to automatically tune thousands of hyperparameters
of deep neural networks. The code can be downloaded from
https://github.com/bigaidream-projects/drmadComment: International Joint Conference on Artificial Intelligence, IJCAI,
201
On-Device Machine Learning: An Algorithms and Learning Theory Perspective
The predominant paradigm for using machine learning models on a device is to
train a model in the cloud and perform inference using the trained model on the
device. However, with increasing number of smart devices and improved hardware,
there is interest in performing model training on the device. Given this surge
in interest, a comprehensive survey of the field from a device-agnostic
perspective sets the stage for both understanding the state-of-the-art and for
identifying open challenges and future avenues of research. However, on-device
learning is an expansive field with connections to a large number of related
topics in AI and machine learning (including online learning, model adaptation,
one/few-shot learning, etc.). Hence, covering such a large number of topics in
a single survey is impractical. This survey finds a middle ground by
reformulating the problem of on-device learning as resource constrained
learning where the resources are compute and memory. This reformulation allows
tools, techniques, and algorithms from a wide variety of research areas to be
compared equitably. In addition to summarizing the state-of-the-art, the survey
also identifies a number of challenges and next steps for both the algorithmic
and theoretical aspects of on-device learning.Comment: Edge Learning, TinyML, Resource Constrained Machine Learning, Deep
learning on device, Statistical Learning Theory, 45 pages surve
PIMBALL: Binary Neural Networks in Spintronic Memory
Neural networks span a wide range of applications of industrial and
commercial significance. Binary neural networks (BNN) are particularly
effective in trading accuracy for performance, energy efficiency or
hardware/software complexity. Here, we introduce a spintronic, re-configurable
in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator,
which allows for massively parallel and energy efficient computation. PIMBALL
is capable of being used as a standard spintronic memory (STT-MRAM) array and a
computational substrate simultaneously. We evaluate PIMBALL using multiple
image classifiers and a genomics kernel. Our simulation results show that
PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based
implementations while delivering higher throughput
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Along with the rapid developments in communication technologies and the surge
in the use of mobile devices, a brand-new computation paradigm, Edge Computing,
is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications
are thriving with the breakthroughs in deep learning and the many improvements
in hardware architectures. Billions of data bytes, generated at the network
edge, put massive demands on data processing and structural optimization. Thus,
there exists a strong demand to integrate Edge Computing and AI, which gives
birth to Edge Intelligence. In this paper, we divide Edge Intelligence into AI
for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial
Intelligence on Edge). The former focuses on providing more optimal solutions
to key problems in Edge Computing with the help of popular and effective AI
technologies while the latter studies how to carry out the entire process of
building AI models, i.e., model training and inference, on the edge. This paper
provides insights into this new inter-disciplinary field from a broader
perspective. It discusses the core concepts and the research road-map, which
should provide the necessary background for potential future research
initiatives in Edge Intelligence.Comment: 13 pages, 3 figure
- …