5,041 research outputs found

    Landscape connectivity and dropout stability of SGD solutions for over-parameterized neural networks

    Get PDF
    The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks

    Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

    Get PDF
    The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.Comment: Proceedings of the 37th International Conference on Machine Learning (ICML

    Impurity induced bound states and proximity effect in a bilayer exciton condensate

    Full text link
    The effect of impurities which induce local interlayer tunneling in bilayer exciton condensates is discussed. We show that a localized single fermion bound state emerges inside the gap for any strength of impurity scattering and calculate the dependence of the impurity state energy and wave function on the potential strength. We show that such an impurity induced single fermion state enhances the interlayer coherence around it, and is similar to the superconducting proximity effect. As a direct consequence of these single impurity states, we predict that a finite concentration of such impurities will increase the critical temperature for exciton condensation.Comment: 4 pages, 2 figure

    Сryptic species of Anopheles messeae sensu lato (Diptera: Culicidae), their identification, features and nomenclature

    Get PDF
    The paper describes the change in perspective in the composition of the A. messeae taxonomic unit. Initially, based on the disequilibrium of natural populations, the species was differentiated into A and B forms using chromosomal inversions as markers. The positive assortative mating, as well as the ecological features and geographical distribution of these forms, made it possible to give them the status of species in statu nascendi. Later, we additionally investigated the EcoRI restriction fragments of the genomic DNA and the ITS2 nucleotide sequences in the A and B A. messeae species. Unambiguous differences between the species in the former marker and semi-quantitative differences in the latter one, alongside with the absence of hybrids in the populations studied, led us to conclude that A. messeae s.l. is comprised of two homosequential cryptic species with parallel chromosomal polymorphisms. Unequivocal parallels between A. lewisi Ludlow, 1920 and A. messeae B in regards to their features, as well as the identity of A. daciae Linton et al., 2004 to A. messeae A in its ITS2 sequence, and to A. messeae Fall. in diagnostic chromosomal inversions, allowed us to consider A. lewisi Ludlow, 1920 and A. messeae B as two names of the same biological species, and A. messeae Fall., 1926, A. messeae A, and A. daciae Linton et al., 2004 as three names of the other one. Both are members of the palaearctic group of the Maculipennis complex under the names Anopheles (Ano.) lewisi Ludlow, 1920 and Anopheles (Ano.) messeae Falleroni, 1926, respectively

    Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

    Get PDF
    Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the "knot" points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.Comment: Accepted to the Journal of Machine Learning Research (JMLR

    The role of canopy gaps in maintaining biodiversity of plants and soil macrofauna in the forests of the northwestern Caucasus

    Get PDF
    The research was carried out in the coniferous-deciduous forests of the northwestern Caucasus, growing in similar climatic and soil-orographic conditions. Three types of forests of different ages were studied: aspen-hornbeam (50-70 years), beech-fir-hornbeam (80-110 years) and fir-beech forests (over 450 years). The studies were performed on the territory Krasnodar Krai (upper reaches of the Pshekha river, State Nature Reserve Chernogor'e) and the Republic of Adygea (upper reaches of the Belaya river, the Caucasian State Biosphere Reserve) in the summer seasons 2016 and 2019. The research involves geobotanical, population-ontogenetic, and soil-zoological methods. It has been established that in the canopy gaps of all forest types species density of plants is almost twice as high as in under-crown areas or even higher due to good light factor and high soil moisture since the tree stand does not intercept precipitation. Regeneration of tree cenopopulations in all forest types is much more effective in canopy gaps compared to under-crown areas. The undergrowth density of different types of trees is 10 and more times higher in gaps than in the under-crown areas. The maximum number of ecological-coenotic groups of plants is observed in the canopy gaps in all types of forest. All major trophic groups of macrofauna inhabit canopy gaps and under-crown areas, but their biomass in gaps is significantly exceeds that in under-crown areas. Due to the fact that soil moisture supply is an essential factor for moisture-loving saprophages’ activity, biomass of saprophages is on average twice as high in gaps than under-crown areas of all forest types. Only canopy gaps have high biomass of anecic earthworms – there are important ecosystem engineers, which contribute a lot to plant litter processing and the formation of soil porosity

    Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

    Full text link
    Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST
    corecore