6,115 research outputs found
New acceleration technique for the backpropagation algorithm
Artificial neural networks have been studied for many years in the hope of achieving human like performance in the area of pattern recognition, speech synthesis and higher level of cognitive process. In the connectionist model there are several interconnected processing elements called the neurons that have limited processing capability. Even though the rate of information transmitted between these elements is limited, the complex interconnection and the cooperative interaction between these elements results in a vastly increased computing power; The neural network models are specified by an organized network topology of interconnected neurons. These networks have to be trained in order them to be used for a specific purpose. Backpropagation is one of the popular methods of training the neural networks. There has been a lot of improvement over the speed of convergence of standard backpropagation algorithm in the recent past. Herein we have presented a new technique for accelerating the existing backpropagation without modifying it. We have used the fourth order interpolation method for the dominant eigen values, by using these we change the slope of the activation function. And by doing so we increase the speed of convergence of the backpropagation algorithm; Our experiments have shown significant improvement in the convergence time for problems widely used in benchmarKing Three to ten fold decrease in convergence time is achieved. Convergence time decreases as the complexity of the problem increases. The technique adjusts the energy state of the system so as to escape from local minima
An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement
learning using deep neural networks. DQNs require a large buffer and batch
processing for an experience replay and rely on a backpropagation based
iterative optimization, making them difficult to be implemented on
resource-limited edge devices. In this paper, we propose a lightweight
on-device reinforcement learning approach for low-cost FPGA devices. It
exploits a recently proposed neural-network based on-device learning approach
that does not rely on the backpropagation method but uses OS-ELM (Online
Sequential Extreme Learning Machine) based training algorithm. In addition, we
propose a combination of L2 regularization and spectral normalization for the
on-device reinforcement learning so that output values of the neural network
can be fit into a certain range and the reinforcement learning becomes stable.
The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a
low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate
that the proposed algorithm and its FPGA implementation complete a CartPole-v0
task 29.77x and 89.40x faster than a conventional DQN-based approach when the
number of hidden-layer nodes is 64
TinyProp -- Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning
Training deep neural networks using backpropagation is very memory and
computationally intensive. This makes it difficult to run on-device learning or
fine-tune neural networks on tiny, embedded devices such as low-power
micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce
the computational load of on-device learning by training only a subset of the
weights and biases. Existing approaches use a static number of weights to
train. A poor choice of this so-called backpropagation ratio limits either the
computational gain or can lead to severe accuracy losses. In this paper we
present TinyProp, the first sparse backpropagation method that dynamically
adapts the back-propagation ratio during on-device training for each training
step. TinyProp induces a small calculation overhead to sort the elements of the
gradient, which does not significantly impact the computational gains. TinyProp
works particularly well on fine-tuning trained networks on MCUs, which is a
typical use case for embedded applications. For typical datasets from three
datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to
non-sparse training with an accuracy loss of on average 1%. On average,
TinyProp is 2.9 times faster than existing, static sparse backpropagation
algorithms and the accuracy loss is reduced on average by 6 % compared to a
typical static setting of the back-propagation ratio.Comment: 7 Pages, AIPE Conference 202
- …