Search CORE

3 research outputs found

An analytic theory of shallow networks dynamics for hinge loss classification

Author: Biroli Giulio
Pellegrini Franco
Publication venue
Publication date: 01/01/2020
Field of study

Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.Comment: 16 pages, 6 figure

arXiv.org e-Print Archive

Sissa Digital Library

Hal-Diderot

Generalisation dynamics of online learning in over-parameterised neural networks

Author: Advani Madhu,
Goldt Sebastian
Krzakala Florent
Saxe Andrew,
Zdeborova Lenka
Publication venue: HAL CCSD
Publication date: 25/01/2019
Field of study

Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data. Here we study the generalisation dynamics of two-layer neural networks in a teacher-student setup, where one network, the student, is trained using stochastic gradient descent (SGD) on data generated by another network, called the teacher. We show how for this problem, the dynamics of SGD are captured by a set of differential equations. In particular, we demonstrate analytically that the generalisation error of the student increases linearly with the network size, with other relevant parameters held constant. Our results indicate that achieving good generalisation in neural networks depends on the interplay of at least the algorithm, its learning rate, the model architecture, and the data set

arXiv.org e-Print Archive

HAL-CEA

Hal-Diderot