Search CORE

16,296 research outputs found

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

Author: Beerel Peter A.
Chugg Keith M.
Dey Sourya
Shao Yinan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/11/2017
Field of study

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for different bit widths.Comment: Presented at the 26th International Conference on Artificial Neural Networks (ICANN) 2017 in Alghero, Ital

arXiv.org e-Print Archive

Crossref

Delta Networks for Optimized Recurrent Network Computation

Author: Delbruck Tobi
Lee Jun Haeng
Liu Shih-Chii
Neil Daniel
Publication venue
Publication date: 16/12/2016
Field of study

Many neural networks exhibit stability in their activation patterns over time in response to inputs from sensors operating under real-world conditions. By capitalizing on this property of natural signals, we propose a Recurrent Neural Network (RNN) architecture called a delta network in which each neuron transmits its value only when the change in its activation exceeds a threshold. The execution of RNNs as delta networks is attractive because their states must be stored and fetched at every timestep, unlike in convolutional neural networks (CNNs). We show that a naive run-time delta network implementation offers modest improvements on the number of memory accesses and computes, but optimized training techniques confer higher accuracy at higher speedup. With these optimizations, we demonstrate a 9X reduction in cost with negligible loss of accuracy for the TIDIGITS audio digit recognition benchmark. Similarly, on the large Wall Street Journal speech recognition benchmark even existing networks can be greatly accelerated as delta networks, and a 5.7x improvement with negligible loss of accuracy can be obtained through training. Finally, on an end-to-end CNN trained for steering angle prediction in a driving dataset, the RNN cost can be reduced by a substantial 100X

arXiv.org e-Print Archive

ZORA