2,688 research outputs found
Depth Separation with Multilayer Mean-Field Networks
Depth separation -- why a deeper network is more powerful than a shallower
one -- has been a major problem in deep learning theory. Previous results often
focus on representation power. For example, arXiv:1904.06984 constructed a
function that is easy to approximate using a 3-layer network but not
approximable by any 2-layer network. In this paper, we show that this
separation is in fact algorithmic: one can learn the function constructed by
arXiv:1904.06984 using an overparameterized network with polynomially many
neurons efficiently. Our result relies on a new way of extending the mean-field
limit to multilayer networks, and a decomposition of loss that factors out the
error introduced by the discretization of infinite-width mean-field networks.Comment: ICLR 202
Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks
Monotonic linear interpolation (MLI) - on the line connecting a random
initialization with the minimizer it converges to, the loss and accuracy are
monotonic - is a phenomenon that is commonly observed in the training of neural
networks. Such a phenomenon may seem to suggest that optimization of neural
networks is easy. In this paper, we show that the MLI property is not
necessarily related to the hardness of optimization problems, and empirical
observations on MLI for deep neural networks depend heavily on biases. In
particular, we show that interpolating both weights and biases linearly leads
to very different influences on the final output, and when different classes
have different last-layer biases on a deep network, there will be a long
plateau in both the loss and accuracy interpolation (which existing theory of
MLI cannot explain). We also show how the last-layer biases for different
classes can be different even on a perfectly balanced dataset using a simple
model. Empirically we demonstrate that similar intuitions hold on practical
networks and realistic datasets.Comment: ICLR 202
Morphology Study for GeV Emission of the Nearby Supernova Remnant G332.5-5.6
Spatial template is important to study the nearby supernova remnant (SNR).
For SNR G332.5-5.6, we report a gaussian disk with radius of about 1.06 degrees
to be a potential good spatial model in the gamma-ray band. Employing this new
gaussian disk, its GeV lightcurve shows a significant variability of about 7
sigma. The -ray observations of this SNR could be explained well either
by a leptonic model or a hadronic model, in which a flat spectrum for the
ejected electrons/protons.Comment: 13 pages,7 figures, accepted by RA
- …