2,810 research outputs found
Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal
Deep Convolutional Neural Networks (CNNs) are widely employed in modern
computer vision algorithms, where the input image is convolved iteratively by
many kernels to extract the knowledge behind it. However, with the depth of
convolutional layers getting deeper and deeper in recent years, the enormous
computational complexity makes it difficult to be deployed on embedded systems
with limited hardware resources. In this paper, we propose two
computation-performance optimization methods to reduce the redundant
convolution kernels of a CNN with performance and architecture constraints, and
apply it to a network for super resolution (SR). Using PSNR drop compared to
the original network as the performance criterion, our method can get the
optimal PSNR under a certain computation budget constraint. On the other hand,
our method is also capable of minimizing the computation required under a given
PSNR drop.Comment: This paper was accepted by 2018 The International Symposium on
Circuits and Systems (ISCAS
FFT-Based Deep Learning Deployment in Embedded Systems
Deep learning has delivered its powerfulness in many application domains,
especially in image and speech recognition. As the backbone of deep learning,
deep neural networks (DNNs) consist of multiple layers of various types with
hundreds to thousands of neurons. Embedded platforms are now becoming essential
for deep learning deployment due to their portability, versatility, and energy
efficiency. The large model size of DNNs, while providing excellent accuracy,
also burdens the embedded platforms with intensive computation and storage.
Researchers have investigated on reducing DNN model size with negligible
accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN
training and inference model suitable for embedded platforms with reduced
asymptotic complexity of both computation and storage, making our approach
distinguished from existing approaches. We develop the training and inference
algorithms based on FFT as the computing kernel and deploy the FFT-based
inference model on embedded platforms achieving extraordinary processing speed.Comment: Design, Automation, and Test in Europe (DATE) For source code, please
contact Mahdi Nazemi at <[email protected]
- …