96,052 research outputs found

    One-loop effective potential for SO(10) GUT theories in de Sitter space

    Full text link
    Zeta-function regularization is applied to evaluate the one-loop effective potential for SO(10) grand-unified theories in de Sitter cosmologies. When the Higgs scalar field belongs to the 210-dimensional irreducible representation of SO(10), attention is focused on the mass matrix relevant for the SU(3)xSU(2)xU(1) symmetry-breaking direction, to agree with low-energy phenomenology of the particle-physics standard model. The analysis is restricted to those values of the tree-level-potential parameters for which the absolute minima of the classical potential have been evaluated. As shown in the recent literature, such minima turn out to be SO(6)xSO(4)- or SU(3)xSU(2)xSU(2)xU(1)-invariant. Electroweak phenomenology is more naturally derived, however, from the former minima. Hence the values of the parameters leading to the alternative set of minima have been discarded. Within this framework, flat-space limit and general form of the one-loop effective potential are studied in detail by using analytic and numerical methods. It turns out that, as far as the absolute-minimum direction is concerned, the flat-space limit of the one-loop calculation about a de Sitter background does not change the results previously obtained in the literature, where the tree-level potential in flat space-time was studied. Moreover, when curvature effects are no longer negligible in the one-loop potential, it is found that the early universe remains bound to reach only the SO(6)xSO(4) absolute minimum.Comment: 25 pages, plain Tex, plus Latex file of the tables appended at the end. Published in Classical and Quantum Gravity, Vol. 11, pp. 2031-2044, August 199

    Theory of Deep Learning IIb: Optimization Properties of SGD

    Get PDF
    In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which are with very high probability also global minimizer

    The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects

    Full text link
    Understanding the behavior of stochastic gradient descent (SGD) in the context of deep neural networks has raised lots of concerns recently. Along this line, we study a general form of gradient based optimization dynamics with unbiased noise, which unifies SGD and standard Langevin dynamics. Through investigating this general optimization dynamics, we analyze the behavior of SGD on escaping from minima and its regularization effects. A novel indicator is derived to characterize the efficiency of escaping from minima through measuring the alignment of noise covariance and the curvature of loss function. Based on this indicator, two conditions are established to show which type of noise structure is superior to isotropic noise in term of escaping efficiency. We further show that the anisotropic noise in SGD satisfies the two conditions, and thus helps to escape from sharp and poor minima effectively, towards more stable and flat minima that typically generalize well. We systematically design various experiments to verify the benefits of the anisotropic noise, compared with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read

    Free Energy Landscape Of Simple Liquids Near The Glass Transition

    Get PDF
    Properties of the free energy landscape in phase space of a dense hard sphere system characterized by a discretized free energy functional of the Ramakrishnan-Yussouff form are investigated numerically. A considerable number of glassy local minima of the free energy are located and the distribution of an appropriately defined ``overlap'' between minima is calculated. The process of transition from the basin of attraction of a minimum to that of another one is studied using a new ``microcanonical'' Monte Carlo procedure, leading to a determination of the effective height of free energy barriers that separate different glassy minima. The general appearance of the free energy landscape resembles that of a putting green: deep minima separated by a fairly flat structure. The growth of the effective free-energy barriers with increasing density is consistent with the Vogel-Fulcher law, and this growth is primarily driven by an entropic mechanism.Comment: 10 pages, 6 postscript figures, uses iopart.cls and iopart10.clo (included). Invited talk at the ICTP Trieste Conference on "Unifying Concepts in Glass Physics", September 1999. To be published in J. Phys. Cond. Ma

    Shaping the learning landscape in neural networks around wide flat minima

    Full text link
    Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
    corecore