Search CORE

1,805 research outputs found

Shaping the learning landscape in neural networks around wide flat minima

Author: Baldassi Carlo
Pittorino Fabrizio
Zecchina Riccardo
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2020
Field of study

Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Archivio istituzionale della ricerca - Politecnico di Milano

Finite size effects in neural network algorithms

Author: Barber David
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

Edinburgh Research Archive

Nonextensive statistics: Theoretical, experimental and computational evidences and connections

Author: Tsallis C.
Publication venue
Publication date: 01/01/1999
Field of study

The domain of validity of standard thermodynamics and Boltzmann-Gibbs statistical mechanics is discussed and then formally enlarged in order to hopefully cover a variety of anomalous systems. The generalization concerns {\it nonextensive} systems, where nonextensivity is understood in the thermodynamical sense. This generalization was first proposed in 1988 inspired by the probabilistic description of multifractal geometries, and has been intensively studied during this decade. In the present effort, after introducing some historical background, we briefly describe the formalism, and then exhibit the present status in what concerns theoretical, experimental and computational evidences and connections, as well as some perspectives for the future. In addition to these, here and there we point out various (possibly) relevant questions, whose answer would certainly clarify our current understanding of the foundations of statistical mechanics and its thermodynamical implicationsComment: 15 figure

arXiv.org e-Print Archive

CiteSeerX

Neutron-proton effective mass splitting in neutron-rich matter at normal density from analyzing nucleon-nucleus scattering data within an isospin dependent optical model

Author: Chen Lie-Wen
Fattoyev Farrukh J.
Guo Wen-Jun
Li Bao-An
Li Xiao-Hua
Newton William G.
Publication venue: 'Elsevier BV'
Publication date: 20/03/2015
Field of study

The neutron-proton effective

k

-mass splitting in asymmetric nucleonic matter of isospin asymmetry

\delta

and normal density is found to be

m^{*}_{n-p}\equiv(m^{*}_{n}-m^{*}_{p})/m=(0.41 \pm0.15)\delta

from analyzing globally 1088 sets of reaction and angular differential cross sections of proton elastic scattering on 130 targets with beam energies from 0.783 MeV to 200 MeV, and 1161 sets of data of neutron elastic scattering on 104 targets with beam energies from 0.05 MeV to 200 MeV within an isospin dependent non-relativistic optical potential model. It sets a useful reference for testing model predictions on the momentum dependence of the nucleon isovector potential necessary for understanding novel structures and reactions of rare isotopes.Comment: Published version, Physics Letters B743 (2015) 40

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Directory of Open Access Journals