5,041 research outputs found
Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks
The optimization of multilayer neural networks typically leads to a solution
with zero training error, yet the landscape can exhibit spurious local minima
and the minima can be disconnected. In this paper, we shed light on this
phenomenon: we show that the combination of stochastic gradient descent (SGD)
and over-parameterization makes the landscape of multilayer neural networks
approximately connected and thus more favorable to optimization. More
specifically, we prove that SGD solutions are connected via a piecewise linear
path, and the increase in loss along this path vanishes as the number of
neurons grows large. This result is a consequence of the fact that the
parameters found by SGD are increasingly dropout stable as the network becomes
wider. We show that, if we remove part of the neurons (and suitably rescale the
remaining ones), the change in loss is independent of the total number of
neurons, and it depends only on how many neurons are left. Our results exhibit
a mild dependence on the input dimension: they are dimension-free for two-layer
networks and depend linearly on the dimension for multilayer networks. We
validate our theoretical findings with numerical experiments for different
architectures and classification tasks
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
The optimization of multilayer neural networks typically leads to a solution
with zero training error, yet the landscape can exhibit spurious local minima
and the minima can be disconnected. In this paper, we shed light on this
phenomenon: we show that the combination of stochastic gradient descent (SGD)
and over-parameterization makes the landscape of multilayer neural networks
approximately connected and thus more favorable to optimization. More
specifically, we prove that SGD solutions are connected via a piecewise linear
path, and the increase in loss along this path vanishes as the number of
neurons grows large. This result is a consequence of the fact that the
parameters found by SGD are increasingly dropout stable as the network becomes
wider. We show that, if we remove part of the neurons (and suitably rescale the
remaining ones), the change in loss is independent of the total number of
neurons, and it depends only on how many neurons are left. Our results exhibit
a mild dependence on the input dimension: they are dimension-free for two-layer
networks and depend linearly on the dimension for multilayer networks. We
validate our theoretical findings with numerical experiments for different
architectures and classification tasks.Comment: Proceedings of the 37th International Conference on Machine Learning
(ICML
Impurity induced bound states and proximity effect in a bilayer exciton condensate
The effect of impurities which induce local interlayer tunneling in bilayer
exciton condensates is discussed. We show that a localized single fermion bound
state emerges inside the gap for any strength of impurity scattering and
calculate the dependence of the impurity state energy and wave function on the
potential strength. We show that such an impurity induced single fermion state
enhances the interlayer coherence around it, and is similar to the
superconducting proximity effect. As a direct consequence of these single
impurity states, we predict that a finite concentration of such impurities will
increase the critical temperature for exciton condensation.Comment: 4 pages, 2 figure
Сryptic species of Anopheles messeae sensu lato (Diptera: Culicidae), their identification, features and nomenclature
The paper describes the change in perspective in the composition of the A. messeae taxonomic unit. Initially, based on the disequilibrium of natural populations, the species was differentiated into A and B forms using chromosomal inversions as markers. The positive assortative mating, as well as the ecological features and geographical distribution of these forms, made it possible to give them the status of species in statu nascendi. Later, we additionally investigated the EcoRI restriction fragments of the genomic DNA and the ITS2 nucleotide sequences in the A and B A. messeae species. Unambiguous differences between the species in the former marker and semi-quantitative differences in the latter one, alongside with the absence of hybrids in the populations studied, led us to conclude that A. messeae s.l. is comprised of two homosequential cryptic species with parallel chromosomal polymorphisms. Unequivocal parallels between A. lewisi Ludlow, 1920 and A. messeae B in regards to their features, as well as the identity of A. daciae Linton et al., 2004 to A. messeae A in its ITS2 sequence, and to A. messeae Fall. in diagnostic chromosomal inversions, allowed us to consider A. lewisi Ludlow, 1920 and A. messeae B as two names of the same biological species, and A. messeae Fall., 1926, A. messeae A, and A. daciae Linton et al., 2004 as three names of the other one. Both are members of the palaearctic group of the Maculipennis complex under the names Anopheles (Ano.) lewisi Ludlow, 1920 and Anopheles (Ano.) messeae Falleroni, 1926, respectively
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks
Understanding the properties of neural networks trained via stochastic
gradient descent (SGD) is at the heart of the theory of deep learning. In this
work, we take a mean-field view, and consider a two-layer ReLU network trained
via SGD for a univariate regularized regression problem. Our main result is
that SGD is biased towards a simple solution: at convergence, the ReLU network
implements a piecewise linear map of the inputs, and the number of "knot"
points - i.e., points where the tangent of the ReLU network estimator changes -
between two consecutive training inputs is at most three. In particular, as the
number of neurons of the network grows, the SGD dynamics is captured by the
solution of a gradient flow and, at convergence, the distribution of the
weights approaches the unique minimizer of a related free energy, which has a
Gibbs form. Our key technical contribution consists in the analysis of the
estimator resulting from this minimizer: we show that its second derivative
vanishes everywhere, except at some specific locations which represent the
"knot" points. We also provide empirical evidence that knots at locations
distinct from the data points might occur, as predicted by our theory.Comment: Accepted to the Journal of Machine Learning Research (JMLR
The role of canopy gaps in maintaining biodiversity of plants and soil macrofauna in the forests of the northwestern Caucasus
The research was carried out in the coniferous-deciduous forests of the northwestern Caucasus, growing in similar climatic and soil-orographic conditions. Three types of forests of different ages were studied: aspen-hornbeam (50-70 years), beech-fir-hornbeam (80-110 years) and fir-beech forests (over 450 years). The studies were performed on the territory Krasnodar Krai (upper reaches of the Pshekha river, State Nature Reserve Chernogor'e) and the Republic of Adygea (upper reaches of the Belaya river, the Caucasian State Biosphere Reserve) in the summer seasons 2016 and 2019. The research involves geobotanical, population-ontogenetic, and soil-zoological methods. It has been established that in the canopy gaps of all forest types species density of plants is almost twice as high as in under-crown areas or even higher due to good light factor and high soil moisture since the tree stand does not intercept precipitation. Regeneration of tree cenopopulations in all forest types is much more effective in canopy gaps compared to under-crown areas. The undergrowth density of different types of trees is 10 and more times higher in gaps than in the under-crown areas. The maximum number of ecological-coenotic groups of plants is observed in the canopy gaps in all types of forest. All major trophic groups of macrofauna inhabit canopy gaps and under-crown areas, but their biomass in gaps is significantly exceeds that in under-crown areas. Due to the fact that soil moisture supply is an essential factor for moisture-loving saprophages’ activity, biomass of saprophages is on average twice as high in gaps than under-crown areas of all forest types. Only canopy gaps have high biomass of anecic earthworms – there are important ecosystem engineers, which contribute a lot to plant litter processing and the formation of soil porosity
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
Autoencoders are a prominent model in many empirical branches of machine
learning and lossy data compression. However, basic theoretical questions
remain unanswered even in a shallow two-layer setting. In particular, to what
degree does a shallow autoencoder capture the structure of the underlying data
distribution? For the prototypical case of the 1-bit compression of sparse
Gaussian data, we prove that gradient descent converges to a solution that
completely disregards the sparse structure of the input. Namely, the
performance of the algorithm is the same as if it was compressing a Gaussian
source - with no sparsity. For general data distributions, we give evidence of
a phase transition phenomenon in the shape of the gradient descent minimizer,
as a function of the data sparsity: below the critical sparsity level, the
minimizer is a rotation taken uniformly at random (just like in the compression
of non-sparse data); above the critical sparsity, the minimizer is the identity
(up to a permutation). Finally, by exploiting a connection with approximate
message passing algorithms, we show how to improve upon Gaussian performance
for the compression of sparse data: adding a denoising function to a shallow
architecture already reduces the loss provably, and a suitable multi-layer
decoder leads to a further improvement. We validate our findings on image
datasets, such as CIFAR-10 and MNIST
- …