16,296 research outputs found
Accelerating Training of Deep Neural Networks via Sparse Edge Processing
We propose a reconfigurable hardware architecture for deep neural networks
(DNNs) capable of online training and inference, which uses algorithmically
pre-determined, structured sparsity to significantly lower memory and
computational requirements. This novel architecture introduces the notion of
edge-processing to provide flexibility and combines junction pipelining and
operational parallelization to speed up training. The overall effect is to
reduce network complexity by factors up to 30x and training time by up to 35x
relative to GPUs, while maintaining high fidelity of inference results. This
has the potential to enable extensive parameter searches and development of the
largely unexplored theoretical foundation of DNNs. The architecture
automatically adapts itself to different network sizes given available hardware
resources. As proof of concept, we show results obtained for different bit
widths.Comment: Presented at the 26th International Conference on Artificial Neural
Networks (ICANN) 2017 in Alghero, Ital
Delta Networks for Optimized Recurrent Network Computation
Many neural networks exhibit stability in their activation patterns over time
in response to inputs from sensors operating under real-world conditions. By
capitalizing on this property of natural signals, we propose a Recurrent Neural
Network (RNN) architecture called a delta network in which each neuron
transmits its value only when the change in its activation exceeds a threshold.
The execution of RNNs as delta networks is attractive because their states must
be stored and fetched at every timestep, unlike in convolutional neural
networks (CNNs). We show that a naive run-time delta network implementation
offers modest improvements on the number of memory accesses and computes, but
optimized training techniques confer higher accuracy at higher speedup. With
these optimizations, we demonstrate a 9X reduction in cost with negligible loss
of accuracy for the TIDIGITS audio digit recognition benchmark. Similarly, on
the large Wall Street Journal speech recognition benchmark even existing
networks can be greatly accelerated as delta networks, and a 5.7x improvement
with negligible loss of accuracy can be obtained through training. Finally, on
an end-to-end CNN trained for steering angle prediction in a driving dataset,
the RNN cost can be reduced by a substantial 100X
- …