1,805 research outputs found
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
Nonextensive statistics: Theoretical, experimental and computational evidences and connections
The domain of validity of standard thermodynamics and Boltzmann-Gibbs
statistical mechanics is discussed and then formally enlarged in order to
hopefully cover a variety of anomalous systems. The generalization concerns
{\it nonextensive} systems, where nonextensivity is understood in the
thermodynamical sense. This generalization was first proposed in 1988 inspired
by the probabilistic description of multifractal geometries, and has been
intensively studied during this decade. In the present effort, after
introducing some historical background, we briefly describe the formalism, and
then exhibit the present status in what concerns theoretical, experimental and
computational evidences and connections, as well as some perspectives for the
future. In addition to these, here and there we point out various (possibly)
relevant questions, whose answer would certainly clarify our current
understanding of the foundations of statistical mechanics and its
thermodynamical implicationsComment: 15 figure
Neutron-proton effective mass splitting in neutron-rich matter at normal density from analyzing nucleon-nucleus scattering data within an isospin dependent optical model
The neutron-proton effective -mass splitting in asymmetric nucleonic
matter of isospin asymmetry and normal density is found to be
from analyzing
globally 1088 sets of reaction and angular differential cross sections of
proton elastic scattering on 130 targets with beam energies from 0.783 MeV to
200 MeV, and 1161 sets of data of neutron elastic scattering on 104 targets
with beam energies from 0.05 MeV to 200 MeV within an isospin dependent
non-relativistic optical potential model. It sets a useful reference for
testing model predictions on the momentum dependence of the nucleon isovector
potential necessary for understanding novel structures and reactions of rare
isotopes.Comment: Published version, Physics Letters B743 (2015) 40
- …