6,898 research outputs found

    Dynamics of on-line gradient descent learning for multilayer neural networks

    Get PDF
    We consider the problem of on-line gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process

    The theory of on-line learning: a statistical physics approach

    Get PDF
    In this paper we review recent theoretical approaches for analysing the dynamics of on-line learning in multilayer neural networks using methods adopted from statistical physics. The analysis is based on monitoring a set of macroscopic variables from which the generalisation error can be calculated. A closed set of dynamical equations for the macroscopic variables is derived analytically and solved numerically. The theoretical framework is then employed for defining optimal learning parameters and for analysing the incorporation of second order information into the learning process using natural gradient descent and matrix-momentum based methods. We will also briefly explain an extension of the original framework for analysing the case where training examples are sampled with repetition

    On-chip Few-shot Learning with Surrogate Gradient Descent on a Neuromorphic Processor

    Get PDF
    Recent work suggests that synaptic plasticity dynamics in biological models of neurons and neuromorphic hardware are compatible with gradient-based learning (Neftci et al., 2019). Gradient-based learning requires iterating several times over a dataset, which is both time-consuming and constrains the training samples to be independently and identically distributed. This is incompatible with learning systems that do not have boundaries between training and inference, such as in neuromorphic hardware. One approach to overcome these constraints is transfer learning, where a portion of the network is pre-trained and mapped into hardware and the remaining portion is trained online. Transfer learning has the advantage that pre-training can be accelerated offline if the task domain is known, and few samples of each class are sufficient for learning the target task at reasonable accuracies. Here, we demonstrate on-line surrogate gradient few-shot learning on Intel's Loihi neuromorphic research processor using features pre-trained with spike-based gradient backpropagation-through-time. Our experimental results show that the Loihi chip can learn gestures online using a small number of shots and achieve results that are comparable to the models simulated on a conventional processor

    On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -

    Full text link
    The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.Comment: 7 pages, 6 figure

    Analysis of Natural Gradient Descent for Multilayer Neural Networks

    Get PDF
    Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods of statistical physics which accurately characterize both transient and asymptotic behavior. A solution of the learning dynamics is obtained for the case of multilayer neural network training in the limit of large input dimension. We find that natural gradient learning leads to optimal asymptotic performance and outperforms gradient descent in the transient, significantly shortening or even removing plateaus in the transient generalization performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review
    • …
    corecore