96,052 research outputs found
One-loop effective potential for SO(10) GUT theories in de Sitter space
Zeta-function regularization is applied to evaluate the one-loop effective
potential for SO(10) grand-unified theories in de Sitter cosmologies. When the
Higgs scalar field belongs to the 210-dimensional irreducible representation of
SO(10), attention is focused on the mass matrix relevant for the
SU(3)xSU(2)xU(1) symmetry-breaking direction, to agree with low-energy
phenomenology of the particle-physics standard model. The analysis is
restricted to those values of the tree-level-potential parameters for which the
absolute minima of the classical potential have been evaluated. As shown in the
recent literature, such minima turn out to be SO(6)xSO(4)- or
SU(3)xSU(2)xSU(2)xU(1)-invariant. Electroweak phenomenology is more naturally
derived, however, from the former minima. Hence the values of the parameters
leading to the alternative set of minima have been discarded. Within this
framework, flat-space limit and general form of the one-loop effective
potential are studied in detail by using analytic and numerical methods. It
turns out that, as far as the absolute-minimum direction is concerned, the
flat-space limit of the one-loop calculation about a de Sitter background does
not change the results previously obtained in the literature, where the
tree-level potential in flat space-time was studied. Moreover, when curvature
effects are no longer negligible in the one-loop potential, it is found that
the early universe remains bound to reach only the SO(6)xSO(4) absolute
minimum.Comment: 25 pages, plain Tex, plus Latex file of the tables appended at the
end. Published in Classical and Quantum Gravity, Vol. 11, pp. 2031-2044,
August 199
Theory of Deep Learning IIb: Optimization Properties of SGD
In Theory IIb we characterize with a mix of theory and experiments the
optimization of deep convolutional networks by Stochastic Gradient Descent. The
main new result in this paper is theoretical and experimental evidence for the
following conjecture about SGD: SGD concentrates in probability -- like the
classical Langevin equation -- on large volume, "flat" minima, selecting flat
minimizers which are with very high probability also global minimizer
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Understanding the behavior of stochastic gradient descent (SGD) in the
context of deep neural networks has raised lots of concerns recently. Along
this line, we study a general form of gradient based optimization dynamics with
unbiased noise, which unifies SGD and standard Langevin dynamics. Through
investigating this general optimization dynamics, we analyze the behavior of
SGD on escaping from minima and its regularization effects. A novel indicator
is derived to characterize the efficiency of escaping from minima through
measuring the alignment of noise covariance and the curvature of loss function.
Based on this indicator, two conditions are established to show which type of
noise structure is superior to isotropic noise in term of escaping efficiency.
We further show that the anisotropic noise in SGD satisfies the two conditions,
and thus helps to escape from sharp and poor minima effectively, towards more
stable and flat minima that typically generalize well. We systematically design
various experiments to verify the benefits of the anisotropic noise, compared
with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics).Comment: ICML 2019 camera read
Free Energy Landscape Of Simple Liquids Near The Glass Transition
Properties of the free energy landscape in phase space of a dense hard sphere
system characterized by a discretized free energy functional of the
Ramakrishnan-Yussouff form are investigated numerically. A considerable number
of glassy local minima of the free energy are located and the distribution of
an appropriately defined ``overlap'' between minima is calculated. The process
of transition from the basin of attraction of a minimum to that of another one
is studied using a new ``microcanonical'' Monte Carlo procedure, leading to a
determination of the effective height of free energy barriers that separate
different glassy minima. The general appearance of the free energy landscape
resembles that of a putting green: deep minima separated by a fairly flat
structure. The growth of the effective free-energy barriers with increasing
density is consistent with the Vogel-Fulcher law, and this growth is primarily
driven by an entropic mechanism.Comment: 10 pages, 6 postscript figures, uses iopart.cls and iopart10.clo
(included). Invited talk at the ICTP Trieste Conference on "Unifying Concepts
in Glass Physics", September 1999. To be published in J. Phys. Cond. Ma
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex
high-dimensional loss function, typically by a stochastic gradient descent
(SGD) strategy. The learning process is observed to be able to find good
minimizers without getting stuck in local critical points, and that such
minimizers are often satisfactory at avoiding overfitting. How these two
features can be kept under control in nonlinear devices composed of millions of
tunable connections is a profound and far reaching open question. In this paper
we study basic non-convex one- and two-layer neural network models which learn
random patterns, and derive a number of basic geometrical and algorithmic
features which suggest some answers. We first show that the error loss function
presents few extremely wide flat minima (WFM) which coexist with narrower
minima and critical points. We then show that the minimizers of the
cross-entropy loss function overlap with the WFM of the error loss. We also
show examples of learning devices for which WFM do not exist. From the
algorithmic perspective we derive entropy driven greedy and message passing
algorithms which focus their search on wide flat regions of minimizers. In the
case of SGD and cross-entropy loss, we show that a slow reduction of the norm
of the weights along the learning process also leads to WFM. We corroborate the
results by a numerical study of the correlations between the volumes of the
minimizers, their Hessian and their generalization performance on real data.Comment: 37 pages (16 main text), 10 figures (7 main text
- …