10,697 research outputs found
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence
Along with the rapid developments in communication technologies and the surge
in the use of mobile devices, a brand-new computation paradigm, Edge Computing,
is surging in popularity. Meanwhile, Artificial Intelligence (AI) applications
are thriving with the breakthroughs in deep learning and the many improvements
in hardware architectures. Billions of data bytes, generated at the network
edge, put massive demands on data processing and structural optimization. Thus,
there exists a strong demand to integrate Edge Computing and AI, which gives
birth to Edge Intelligence. In this paper, we divide Edge Intelligence into AI
for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial
Intelligence on Edge). The former focuses on providing more optimal solutions
to key problems in Edge Computing with the help of popular and effective AI
technologies while the latter studies how to carry out the entire process of
building AI models, i.e., model training and inference, on the edge. This paper
provides insights into this new inter-disciplinary field from a broader
perspective. It discusses the core concepts and the research road-map, which
should provide the necessary background for potential future research
initiatives in Edge Intelligence.Comment: 13 pages, 3 figure
CNN inference acceleration using dictionary of centroids
It is well known that multiplication operations in convolutional layers of
common CNNs consume a lot of time during inference stage. In this article we
present a flexible method to decrease both computational complexity of
convolutional layers in inference as well as amount of space to store them. The
method is based on centroid filter quantization and outperforms approaches
based on tensor decomposition by a large margin. We performed comparative
analysis of the proposed method and series of CP tensor decomposition on
ImageNet benchmark and found that our method provide almost 2.9 times better
computational gain. Despite the simplicity of our method it cannot be applied
directly in inference stage in modern frameworks, but could be useful for cases
calculation flow could be changed, e.g. for CNN-chip designers
DeepCABAC: Context-adaptive binary arithmetic coding for deep neural network compression
We present DeepCABAC, a novel context-adaptive binary arithmetic coder for
compressing deep neural networks. It quantizes each weight parameter by
minimizing a weighted rate-distortion function, which implicitly takes the
impact of quantization on to the accuracy of the network into account.
Subsequently, it compresses the quantized values into a bitstream
representation with minimal redundancies. We show that DeepCABAC is able to
reach very high compression ratios across a wide set of different network
architectures and datasets. For instance, we are able to compress by x63.6 the
VGG16 ImageNet model with no loss of accuracy, thus being able to represent the
entire network with merely 8.7MB.Comment: ICML 2019, Joint Workshop on On-Device Machine Learning and Compact
Deep Neural Network Representations (ODML-CDNNR
Structured Pruning for Efficient ConvNets via Incremental Regularization
Parameter pruning is a promising approach for CNN compression and
acceleration by eliminating redundant model parameters with tolerable
performance degrade. Despite its effectiveness, existing regularization-based
parameter pruning methods usually drive weights towards zero with large and
constant regularization factors, which neglects the fragility of the
expressiveness of CNNs, and thus calls for a more gentle regularization scheme
so that the networks can adapt during pruning. To achieve this, we propose a
new and novel regularization-based pruning method, named IncReg, to
incrementally assign different regularization factors to different weights
based on their relative importance. Empirical analysis on CIFAR-10 dataset
verifies the merits of IncReg. Further extensive experiments with popular CNNs
on CIFAR-10 and ImageNet datasets show that IncReg achieves comparable to even
better results compared with state-of-the-arts. Our source codes and trained
models are available here: https://github.com/mingsun-tse/caffe_increg.Comment: IJCNN 201
A Survey of FPGA-Based Neural Network Accelerator
Recent researches on neural network have shown significant advantage in
machine learning over traditional algorithms based on handcrafted features and
models. Neural network is now widely adopted in regions like image, speech and
video recognition. But the high computation and storage complexity of neural
network inference poses great difficulty on its application. CPU platforms are
hard to offer enough computation capacity. GPU platforms are the first choice
for neural network process because of its high computation capacity and easy to
use development frameworks.
On the other hand, FPGA-based neural network inference accelerator is
becoming a research topic. With specifically designed hardware, FPGA is the
next possible solution to surpass GPU in speed and energy efficiency. Various
FPGA-based accelerator designs have been proposed with software and hardware
optimization techniques to achieve high speed and energy efficiency. In this
paper, we give an overview of previous work on neural network inference
accelerators based on FPGA and summarize the main techniques used. An
investigation from software to hardware, from circuit level to system level is
carried out to complete analysis of FPGA-based neural network inference
accelerator design and serves as a guide to future work
SCSP: Spectral Clustering Filter Pruning with Soft Self-adaption Manners
Deep Convolutional Neural Networks (CNN) has achieved significant success in
computer vision field. However, the high computational cost of the deep complex
models prevents the deployment on edge devices with limited memory and
computational resource. In this paper, we proposed a novel filter pruning for
convolutional neural networks compression, namely spectral clustering filter
pruning with soft self-adaption manners (SCSP). We first apply spectral
clustering on filters layer by layer to explore their intrinsic connections and
only count on efficient groups. By self-adaption manners, the pruning
operations can be done in few epochs to let the network gradually choose
meaningful groups. According to this strategy, we not only achieve model
compression while keeping considerable performance, but also find a novel angle
to interpret the model compression process
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing
With the breakthroughs in deep learning, the recent years have witnessed a
booming of artificial intelligence (AI) applications and services, spanning
from personal assistant to recommendation systems to video/audio surveillance.
More recently, with the proliferation of mobile computing and
Internet-of-Things (IoT), billions of mobile and IoT devices are connected to
the Internet, generating zillions Bytes of data at the network edge. Driving by
this trend, there is an urgent need to push the AI frontiers to the network
edge so as to fully unleash the potential of the edge big data. To meet this
demand, edge computing, an emerging paradigm that pushes computing tasks and
services from the network core to the network edge, has been widely recognized
as a promising solution. The resulted new inter-discipline, edge AI or edge
intelligence, is beginning to receive a tremendous amount of interest. However,
research on edge intelligence is still in its infancy stage, and a dedicated
venue for exchanging the recent advances of edge intelligence is highly desired
by both the computer system and artificial intelligence communities. To this
end, we conduct a comprehensive survey of the recent research efforts on edge
intelligence. Specifically, we first review the background and motivation for
artificial intelligence running at the network edge. We then provide an
overview of the overarching architectures, frameworks and emerging key
technologies for deep learning model towards training/inference at the network
edge. Finally, we discuss future research opportunities on edge intelligence.
We believe that this survey will elicit escalating attentions, stimulate
fruitful discussions and inspire further research ideas on edge intelligence.Comment: Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang,
"Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge
Computing," Proceedings of the IEE
- …