Search CORE

38,038 research outputs found

A Mean Field View of the Landscape of Two-Layers Neural Networks

Author: Mei Song
Montanari Andrea
Nguyen Phan-Minh
Publication venue
Publication date: 28/08/2018
Field of study

Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional objective (risk function), a problem which is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the first case, does this happen because local minima are absent, or because SGD somehow avoids them? In the second, why do local minima reached by SGD have good generalization properties? In this paper we consider a simple case, namely two-layers neural networks, and prove that -in a suitable scaling limit- SGD dynamics is captured by a certain non-linear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples, and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows to 'average-out' some of the complexities of the landscape of neural networks, and can be used to prove a general convergence result for noisy SGD.Comment: 103 page

arXiv.org e-Print Archive

Phase Transitions of Neural Networks

Author: Biehl M.
Opper M.
Wolfgang Kinzel
Yeomans J.
Publication venue: 'Informa UK Limited'
Publication date: 11/04/1997
Field of study

The cooperative behaviour of interacting neurons and synapses is studied using models and methods from statistical physics. The competition between training error and entropy may lead to discontinuous properties of the neural network. This is demonstrated for a few examples: Perceptron, associative memory, learning from examples, generalization, multilayer networks, structure recognition, Bayesian estimate, on-line training, noise estimation and time series generation.Comment: Plenary talk for MINERVA workshop on mesoscopics, fractals and neural networks, Eilat, March 1997 Postscript Fil

arXiv.org e-Print Archive

Crossref

A jamming transition from under- to over-parametrization affects loss landscape and generalization

Author: Biroli Giulio
d'Ascoli Stéphane
Geiger Mario
Sagun Levent
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/06/2019
Field of study

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to hamper minimization. Our findings support a link between this transition and the generalization properties of the network: as we increase the number of parameters of a given model, starting from an under-parametrized network, we observe that the generalization error displays three phases: (i) initial decay, (ii) increase until the transition point --- where it displays a cusp --- and (iii) slow decay toward a constant for the rest of the over-parametrized regime. Thereby we identify the region where the classical phenomenon of over-fitting takes place, and the region where the model keeps improving, in line with previous empirical observations for modern neural networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934

arXiv.org e-Print Archive

Hal-Diderot