17 research outputs found

    An information and field theoretic approach to the grand canonical ensemble

    Full text link
    We present a novel derivation of the constraints required to obtain the underlying principles of statistical mechanics using a maximum entropy framework. We derive the mean value constraints by use of the central limit theorem and the scaling properties of Lagrange multipliers. We then arrive at the same result using a quantum free field theory and the Ward identities. The work provides a principled footing for maximum entropy methods in statistical physics, adding the body of work aligned to Jaynes's vision of statistical mechanics as a form of inference rather than a physical theory dependent on ergodicity, metric transitivity and equal a priori probabilities. We show that statistical independence, in the macroscopic limit, is the unifying concept that leads to all these derivations.Comment: 7 pages, 3 pages of Appendi

    Explaining the Adaptive Generalisation Gap

    Full text link
    We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods stems from the increased estimation noise in the flattest directions of the true loss surface. We demonstrate that typical schedules used for adaptive methods (with low numerical stability or damping constants) serve to bias relative movement towards flat directions relative to sharp directions, effectively amplifying the noise-to-signal ratio and harming generalisation. We further demonstrate that the numerical stability/damping constant used in these methods can be decomposed into a learning rate reduction and linear shrinkage of the estimated curvature matrix. We then demonstrate significant generalisation improvements by increasing the shrinkage coefficient, closing the generalisation gap entirely in both Logistic Regression and Deep Neural Network experiments. Finally, we show that other popular modifications to adaptive methods, such as decoupled weight decay and partial adaptivity can be shown to calibrate parameter updates to make better use of sharper, more reliable directions

    Appearence of Random Matrix Theory in Deep Learning

    Get PDF
    We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the study of loss surfaces in deep learning. Inspired by these observations, we propose a novel model for the true loss surfaces of neural networks, consistent with our observations, which allows for Hessian spectral densities with rank degeneracy and outliers, extensively observed in practice, and predicts a growing independence of loss gradients as a function of distance in weight-space. We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance.Comment: 33 pages, 14 figure
    corecore