12,768 research outputs found
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
Structured Probabilistic Pruning for Convolutional Neural Network Acceleration
In this paper, we propose a novel progressive parameter pruning method for
Convolutional Neural Network acceleration, named Structured Probabilistic
Pruning (SPP), which effectively prunes weights of convolutional layers in a
probabilistic manner. Unlike existing deterministic pruning approaches, where
unimportant weights are permanently eliminated, SPP introduces a pruning
probability for each weight, and pruning is guided by sampling from the pruning
probabilities. A mechanism is designed to increase and decrease pruning
probabilities based on importance criteria in the training process. Experiments
show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of
top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet
classification. Moreover, SPP can be directly applied to accelerate
multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x
speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We
further show the effectiveness of SPP on transfer learning tasks.Comment: CNN model acceleration, 13 pages, 6 figures, accepted by Proceedings
of the British Machine Vision Conference (BMVC), 2018 ora
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Due to recent advances in digital technologies, and availability of credible
data, an area of artificial intelligence, deep learning, has emerged, and has
demonstrated its ability and effectiveness in solving complex learning problems
not possible before. In particular, convolution neural networks (CNNs) have
demonstrated their effectiveness in image detection and recognition
applications. However, they require intensive CPU operations and memory
bandwidth that make general CPUs fail to achieve desired performance levels.
Consequently, hardware accelerators that use application specific integrated
circuits (ASICs), field programmable gate arrays (FPGAs), and graphic
processing units (GPUs) have been employed to improve the throughput of CNNs.
More precisely, FPGAs have been recently adopted for accelerating the
implementation of deep learning networks due to their ability to maximize
parallelism as well as due to their energy efficiency. In this paper, we review
recent existing techniques for accelerating deep learning networks on FPGAs. We
highlight the key features employed by the various techniques for improving the
acceleration performance. In addition, we provide recommendations for enhancing
the utilization of FPGAs for CNNs acceleration. The techniques investigated in
this paper represent the recent trends in FPGA-based accelerators of deep
learning networks. Thus, this review is expected to direct the future advances
on efficient hardware accelerators and to be useful for deep learning
researchers.Comment: This article has been accepted for publication in IEEE Access
(December, 2018
A novel channel pruning method for deep neural network compression
In recent years, deep neural networks have achieved great success in the
field of computer vision. However, it is still a big challenge to deploy these
deep models on resource-constrained embedded devices such as mobile robots,
smart phones and so on. Therefore, network compression for such platforms is a
reasonable solution to reduce memory consumption and computation complexity. In
this paper, a novel channel pruning method based on genetic algorithm is
proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a
pre-trained CNN model is pruned layer by layer according to the sensitivity of
each layer. After that, the pruned model is fine-tuned based on knowledge
distillation framework. These two improvements significantly decrease the model
redundancy with less accuracy drop. Channel selection is a combinatorial
optimization problem that has exponential solution space. In order to
accelerate the selection process, the proposed method formulates it as a search
problem, which can be solved efficiently by genetic algorithm. Meanwhile, a
two-step approximation fitness function is designed to further improve the
efficiency of genetic process. The proposed method has been verified on three
benchmark datasets with two popular CNN models: VGGNet and ResNet. On the
CIFAR-100 and ImageNet datasets, our approach outperforms several
state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet
achieves better performance than the original model with 8 times parameters
compression and 3 times FLOPs reduction
Exploring the Regularity of Sparse Structure in Convolutional Neural Networks
Sparsity helps reduce the computational complexity of deep neural networks by
skipping zeros. Taking advantage of sparsity is listed as a high priority in
next generation DNN accelerators such as TPU. The structure of sparsity, i.e.,
the granularity of pruning, affects the efficiency of hardware accelerator
design as well as the prediction accuracy. Coarse-grained pruning creates
regular sparsity patterns, making it more amenable for hardware acceleration
but more challenging to maintain the same accuracy. In this paper we
quantitatively measure the trade-off between sparsity regularity and prediction
accuracy, providing insights in how to maintain accuracy while having more a
more structured sparsity pattern. Our experimental results show that
coarse-grained pruning can achieve a sparsity ratio similar to unstructured
pruning without loss of accuracy. Moreover, due to the index saving effect,
coarse-grained pruning is able to obtain a better compression ratio than
fine-grained sparsity at the same accuracy threshold. Based on the recent
sparse convolutional neural network accelerator (SCNN), our experiments further
demonstrate that coarse-grained sparsity saves about 2x the memory references
compared to fine-grained sparsity. Since memory reference is more than two
orders of magnitude more expensive than arithmetic operations, the regularity
of sparse structure leads to more efficient hardware design.Comment: submitted to NIPS 201
Ensemble Convolutional Neural Networks for Mode Inference in Smartphone Travel Survey
We develop ensemble Convolutional Neural Networks (CNNs) to classify the
transportation mode of trip data collected as part of a large-scale smartphone
travel survey in Montreal, Canada. Our proposed ensemble library is composed of
a series of CNN models with different hyper-parameter values and CNN
architectures. In our final model, we combine the output of CNN models using
"average voting", "majority voting" and "optimal weights" methods. Furthermore,
we exploit the ensemble library by deploying a Random Forest model as a
meta-learner. The ensemble method with random forest as meta-learner shows an
accuracy of 91.8% which surpasses the other three ensemble combination methods,
as well as other comparable models reported in the literature. The "majority
voting" and "optimal weights" combination methods result in prediction accuracy
rates around 89%, while "average voting" is able to achieve an accuracy of only
85%
Deep Learning for Surface Material Classification Using Haptic And Visual Information
When a user scratches a hand-held rigid tool across an object surface, an
acceleration signal can be captured, which carries relevant information about
the surface. More importantly, such a haptic signal is complementary to the
visual appearance of the surface, which suggests the combination of both
modalities for the recognition of the surface material. In this paper, we
present a novel deep learning method dealing with the surface material
classification problem based on a Fully Convolutional Network (FCN), which
takes as input the aforementioned acceleration signal and a corresponding image
of the surface texture. Compared to previous surface material classification
solutions, which rely on a careful design of hand-crafted domain-specific
features, our method automatically extracts discriminative features utilizing
the advanced deep learning methodologies. Experiments performed on the TUM
surface material database demonstrate that our method achieves state-of-the-art
classification accuracy robustly and efficiently.Comment: 8 pages, under review as a paper at Transactions on Multimedi
An Entropy-based Pruning Method for CNN Compression
This paper aims to simultaneously accelerate and compress off-the-shelf CNN
models via filter pruning strategy. The importance of each filter is evaluated
by the proposed entropy-based method first. Then several unimportant filters
are discarded to get a smaller CNN model. Finally, fine-tuning is adopted to
recover its generalization ability which is damaged during filter pruning. Our
method can reduce the size of intermediate activations, which would dominate
most memory footprint during model training stage but is less concerned in
previous compression methods. Experiments on the ILSVRC-12 benchmark
demonstrate the effectiveness of our method. Compared with previous filter
importance evaluation criteria, our entropy-based method obtains better
performance. We achieve 3.3x speed-up and 16.64x compression on VGG-16, 1.54x
acceleration and 1.47x compression on ResNet-50, both with about 1% top-5
accuracy decrease
- …