2,226 research outputs found
A jamming transition from under- to over-parametrization affects loss landscape and generalization
We argue that in fully-connected networks a phase transition delimits the
over- and under-parametrized regimes where fitting can or cannot be achieved.
Under some general conditions, we show that this transition is sharp for the
hinge loss. In the whole over-parametrized regime, poor minima of the loss are
not encountered during training since the number of constraints to satisfy is
too small to hamper minimization. Our findings support a link between this
transition and the generalization properties of the network: as we increase the
number of parameters of a given model, starting from an under-parametrized
network, we observe that the generalization error displays three phases: (i)
initial decay, (ii) increase until the transition point --- where it displays a
cusp --- and (iii) slow decay toward a constant for the rest of the
over-parametrized regime. Thereby we identify the region where the classical
phenomenon of over-fitting takes place, and the region where the model keeps
improving, in line with previous empirical observations for modern neural
networks.Comment: arXiv admin note: text overlap with arXiv:1809.0934
Machine-learning nonstationary noise out of gravitational-wave detectors
Signal extraction out of background noise is a common challenge in high-precision physics experiments, where the measurement output is often a continuous data stream. To improve the signal-to-noise ratio of the detection, witness sensors are often used to independently measure background noises and subtract them from the main signal. If the noise coupling is linear and stationary, optimal techniques already exist and are routinely implemented in many experiments. However, when the noise coupling is nonstationary, linear techniques often fail or are suboptimal. Inspired by the properties of the background noise in gravitational wave detectors, this work develops a novel algorithm to efficiently characterize and remove nonstationary noise couplings, provided there exist witnesses of the noise source and of the modulation. In this work, the algorithm is described in its most general formulation, and its efficiency is demonstrated with examples from the data of the Advanced LIGO gravitational-wave observatory, where we could obtain an improvement of the detector gravitational-wave reach without introducing any bias on the source parameter estimation
Analysis of Natural Gradient Descent for Multilayer Neural Networks
Natural gradient descent is a principled method for adapting the parameters
of a statistical model on-line using an underlying Riemannian parameter space
to redefine the direction of steepest descent. The algorithm is examined via
methods of statistical physics which accurately characterize both transient and
asymptotic behavior. A solution of the learning dynamics is obtained for the
case of multilayer neural network training in the limit of large input
dimension. We find that natural gradient learning leads to optimal asymptotic
performance and outperforms gradient descent in the transient, significantly
shortening or even removing plateaus in the transient generalization
performance which typically hamper gradient descent training.Comment: 14 pages including figures. To appear in Physical Review
Neural network parametrization of spectral functions from hadronic tau decays and determination of QCD vacuum condensates
The spectral function is determined from ALEPH and OPAL data
on hadronic tau decays using a neural network parametrization trained to retain
the full experimental information on errors, their correlations and chiral sum
rules: the DMO sum rule, the first and second Weinberg sum rules and the
electromagnetic mass splitting of the pion sum rule. Nonperturbative QCD vacuum
condensates can then be determined from finite energy sum rules. Our method
minimizes all sources of theoretical uncertainty and bias producing an estimate
of the condensates which is independent of the specific finite energy sum rule
used. The results for the central values of the condensates and are
both negative.Comment: 29 pages, 18 ps figure
- …