6,898 research outputs found
Dynamics of on-line gradient descent learning for multilayer neural networks
We consider the problem of on-line gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process
The theory of on-line learning: a statistical physics approach
In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition
On-chip Few-shot Learning with Surrogate Gradient Descent on a Neuromorphic Processor
Recent work suggests that synaptic plasticity dynamics in biological models
of neurons and neuromorphic hardware are compatible with gradient-based
learning (Neftci et al., 2019). Gradient-based learning requires iterating
several times over a dataset, which is both time-consuming and constrains the
training samples to be independently and identically distributed. This is
incompatible with learning systems that do not have boundaries between training
and inference, such as in neuromorphic hardware. One approach to overcome these
constraints is transfer learning, where a portion of the network is pre-trained
and mapped into hardware and the remaining portion is trained online. Transfer
learning has the advantage that pre-training can be accelerated offline if the
task domain is known, and few samples of each class are sufficient for learning
the target task at reasonable accuracies. Here, we demonstrate on-line
surrogate gradient few-shot learning on Intel's Loihi neuromorphic research
processor using features pre-trained with spike-based gradient
backpropagation-through-time. Our experimental results show that the Loihi chip
can learn gestures online using a small number of shots and achieve results
that are comparable to the models simulated on a conventional processor
On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -
The permutation symmetry of the hidden units in multilayer perceptrons causes
the saddle structure and plateaus of the learning dynamics in gradient learning
methods. The correlation of the weight vectors of hidden units in a teacher
network is thought to affect this saddle structure, resulting in a prolonged
learning time, but this mechanism is still unclear. In this paper, we discuss
it with regard to soft committee machines and on-line learning using
statistical mechanics. Conventional gradient descent needs more time to break
the symmetry as the correlation of the teacher weight vectors rises. On the
other hand, no plateaus occur with natural gradient descent regardless of the
correlation for the limit of a low learning rate. Analytical results support
these dynamics around the saddle point.Comment: 7 pages, 6 figure
Analysis of Natural Gradient Descent for Multilayer Neural Networks
Natural gradient descent is a principled method for adapting the parameters
of a statistical model on-line using an underlying Riemannian parameter space
to redefine the direction of steepest descent. The algorithm is examined via
methods of statistical physics which accurately characterize both transient and
asymptotic behavior. A solution of the learning dynamics is obtained for the
case of multilayer neural network training in the limit of large input
dimension. We find that natural gradient learning leads to optimal asymptotic
performance and outperforms gradient descent in the transient, significantly
shortening or even removing plateaus in the transient generalization
performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review
- …