4,739 research outputs found
A Survey of Model Compression and Acceleration for Deep Neural Networks
Deep neural networks (DNNs) have recently achieved great success in many
visual recognition tasks. However, existing deep neural network models are
computationally expensive and memory intensive, hindering their deployment in
devices with low memory resources or in applications with strict latency
requirements. Therefore, a natural thought is to perform model compression and
acceleration in deep networks without significantly decreasing the model
performance. During the past five years, tremendous progress has been made in
this area. In this paper, we review the recent techniques for compacting and
accelerating DNN models. In general, these techniques are divided into four
categories: parameter pruning and quantization, low-rank factorization,
transferred/compact convolutional filters, and knowledge distillation. Methods
of parameter pruning and quantization are described first, after that the other
techniques are introduced. For each category, we also provide insightful
analysis about the performance, related applications, advantages, and
drawbacks. Then we go through some very recent successful methods, for example,
dynamic capacity networks and stochastic depths networks. After that, we survey
the evaluation matrices, the main datasets used for evaluating the model
performance, and recent benchmark efforts. Finally, we conclude this paper,
discuss remaining the challenges and possible directions for future work.Comment: Published in IEEE Signal Processing Magazine, updated version
including more recent work
A Survey of FPGA-Based Neural Network Accelerator
Recent researches on neural network have shown significant advantage in
machine learning over traditional algorithms based on handcrafted features and
models. Neural network is now widely adopted in regions like image, speech and
video recognition. But the high computation and storage complexity of neural
network inference poses great difficulty on its application. CPU platforms are
hard to offer enough computation capacity. GPU platforms are the first choice
for neural network process because of its high computation capacity and easy to
use development frameworks.
On the other hand, FPGA-based neural network inference accelerator is
becoming a research topic. With specifically designed hardware, FPGA is the
next possible solution to surpass GPU in speed and energy efficiency. Various
FPGA-based accelerator designs have been proposed with software and hardware
optimization techniques to achieve high speed and energy efficiency. In this
paper, we give an overview of previous work on neural network inference
accelerators based on FPGA and summarize the main techniques used. An
investigation from software to hardware, from circuit level to system level is
carried out to complete analysis of FPGA-based neural network inference
accelerator design and serves as a guide to future work
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Due to recent advances in digital technologies, and availability of credible
data, an area of artificial intelligence, deep learning, has emerged, and has
demonstrated its ability and effectiveness in solving complex learning problems
not possible before. In particular, convolution neural networks (CNNs) have
demonstrated their effectiveness in image detection and recognition
applications. However, they require intensive CPU operations and memory
bandwidth that make general CPUs fail to achieve desired performance levels.
Consequently, hardware accelerators that use application specific integrated
circuits (ASICs), field programmable gate arrays (FPGAs), and graphic
processing units (GPUs) have been employed to improve the throughput of CNNs.
More precisely, FPGAs have been recently adopted for accelerating the
implementation of deep learning networks due to their ability to maximize
parallelism as well as due to their energy efficiency. In this paper, we review
recent existing techniques for accelerating deep learning networks on FPGAs. We
highlight the key features employed by the various techniques for improving the
acceleration performance. In addition, we provide recommendations for enhancing
the utilization of FPGAs for CNNs acceleration. The techniques investigated in
this paper represent the recent trends in FPGA-based accelerators of deep
learning networks. Thus, this review is expected to direct the future advances
on efficient hardware accelerators and to be useful for deep learning
researchers.Comment: This article has been accepted for publication in IEEE Access
(December, 2018
Dynamic Channel Pruning: Feature Boosting and Suppression
Making deep convolutional neural networks more accurate typically comes at
the cost of increased computational and memory resources. In this paper, we
reduce this cost by exploiting the fact that the importance of features
computed by convolutional layers is highly input-dependent, and propose feature
boosting and suppression (FBS), a new method to predictively amplify salient
convolutional channels and skip unimportant ones at run-time. FBS introduces
small auxiliary connections to existing convolutional layers. In contrast to
channel pruning methods which permanently remove channels, it preserves the
full network structures and accelerates convolution by dynamically skipping
unimportant input and output channels. FBS-augmented networks are trained with
conventional stochastic gradient descent, making it readily available for many
state-of-the-art CNNs. We compare FBS to a range of existing channel pruning
and dynamic execution schemes and demonstrate large improvements on ImageNet
classification. Experiments show that FBS can respectively provide
and savings in compute on VGG-16 and ResNet-18, both with less than
top-5 accuracy loss.Comment: 14 pages, 5 figures, 4 tables, published as a conference paper at
ICLR 201
Extreme Network Compression via Filter Group Approximation
In this paper we propose a novel decomposition method based on filter group
approximation, which can significantly reduce the redundancy of deep
convolutional neural networks (CNNs) while maintaining the majority of feature
representation. Unlike other low-rank decomposition algorithms which operate on
spatial or channel dimension of filters, our proposed method mainly focuses on
exploiting the filter group structure for each layer. For several commonly used
CNN models, including VGG and ResNet, our method can reduce over 80%
floating-point operations (FLOPs) with less accuracy drop than state-of-the-art
methods on various image classification datasets. Besides, experiments
demonstrate that our method is conducive to alleviating degeneracy of the
compressed network, which hurts the convergence and performance of the network.Comment: Accepted by ECCV201
Structured Probabilistic Pruning for Convolutional Neural Network Acceleration
In this paper, we propose a novel progressive parameter pruning method for
Convolutional Neural Network acceleration, named Structured Probabilistic
Pruning (SPP), which effectively prunes weights of convolutional layers in a
probabilistic manner. Unlike existing deterministic pruning approaches, where
unimportant weights are permanently eliminated, SPP introduces a pruning
probability for each weight, and pruning is guided by sampling from the pruning
probabilities. A mechanism is designed to increase and decrease pruning
probabilities based on importance criteria in the training process. Experiments
show that, with 4x speedup, SPP can accelerate AlexNet with only 0.3% loss of
top-5 accuracy and VGG-16 with 0.8% loss of top-5 accuracy in ImageNet
classification. Moreover, SPP can be directly applied to accelerate
multi-branch CNN networks, such as ResNet, without specific adaptations. Our 2x
speedup ResNet-50 only suffers 0.8% loss of top-5 accuracy on ImageNet. We
further show the effectiveness of SPP on transfer learning tasks.Comment: CNN model acceleration, 13 pages, 6 figures, accepted by Proceedings
of the British Machine Vision Conference (BMVC), 2018 ora
A novel channel pruning method for deep neural network compression
In recent years, deep neural networks have achieved great success in the
field of computer vision. However, it is still a big challenge to deploy these
deep models on resource-constrained embedded devices such as mobile robots,
smart phones and so on. Therefore, network compression for such platforms is a
reasonable solution to reduce memory consumption and computation complexity. In
this paper, a novel channel pruning method based on genetic algorithm is
proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a
pre-trained CNN model is pruned layer by layer according to the sensitivity of
each layer. After that, the pruned model is fine-tuned based on knowledge
distillation framework. These two improvements significantly decrease the model
redundancy with less accuracy drop. Channel selection is a combinatorial
optimization problem that has exponential solution space. In order to
accelerate the selection process, the proposed method formulates it as a search
problem, which can be solved efficiently by genetic algorithm. Meanwhile, a
two-step approximation fitness function is designed to further improve the
efficiency of genetic process. The proposed method has been verified on three
benchmark datasets with two popular CNN models: VGGNet and ResNet. On the
CIFAR-100 and ImageNet datasets, our approach outperforms several
state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet
achieves better performance than the original model with 8 times parameters
compression and 3 times FLOPs reduction
Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications
Recently, the field of deep learning has received great attention by the
scientific community and it is used to provide improved solutions to many
computer vision problems. Convolutional neural networks (CNNs) have been
successfully used to attack problems such as object recognition, object
detection, semantic segmentation, and scene understanding. The rapid
development of deep learning goes hand by hand with the adaptation of GPUs for
accelerating its processes, such as network training and inference. Even though
FPGA design exists long before the use of GPUs for accelerating computations
and despite the fact that high-level synthesis (HLS) tools are getting more
attractive, the adaptation of FPGAs for deep learning research and application
development is poor due to the requirement of hardware design related
expertise. This work presents a workflow for deep learning mobile application
acceleration on small low-cost low-power FPGA devices using HLS tools. This
workflow eases the design of an improved version of the SqueezeJet accelerator
used for the speedup of mobile-friendly low-parameter ImageNet class CNNs, such
as the SqueezeNet v1.1 and the ZynqNet. Additionally, the workflow includes the
development of an HLS-driven analytical model which is used for performance
estimation of the accelerator. This model can be also used to direct the design
process and lead to future design improvements and optimizations.Comment: Accepted to be presented in the 15th International Symposium on
Applied Reconfigurable Computin
- …