60 research outputs found

    Bayesian neural networks increasingly sparsify their units with depth

    Get PDF
    We investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonlinearities, shedding light on novel sparsity-inducing mechanisms at the level of the units of the network, both pre-and post-nonlinearities. The main thrust of the paper is to establish that the units prior distribution becomes increasingly heavy-tailed with depth. We show that first layer units are Gaussian, second layer units are sub-Exponential, and we introduce sub-Weibull distributions to characterize the deeper layers units. Bayesian neural networks with Gaussian priors are well known to induce the weight decay penalty on the weights. In contrast, our result indicates a more elaborate regularization scheme at the level of the units, ranging from convex penalties for the first two layers-weight decay for the first and Lasso for the second to non convex penalties for deeper layers. Thus, despite weight decay does not allow for the weights to be set exactly to zero, sparse solutions tend to be selected for the units from the second layer onward. This result provides new theoretical insight on deep Bayesian neural networks, underpinning their natural shrinkage properties and practical potential

    Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

    Get PDF
    The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field

    Bayesian neural networks become heavier-tailed with depth

    Get PDF
    International audienceWe investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonlinearities, shedding light on novel distribution properties at the level of the neural network units. The main thrust of the paper is to establish that the prior distribution induced on the units before and after activation becomes increasingly heavier-tailed with depth. We show that first layer units are Gaussian, second layer units are sub-Exponential, and we introduce sub-Weibull distributions to characterize the deeper layers units. This result provides new theoretical insight on deep Bayesian neural networks, underpinning their practical potential. The workshop paper is based on the original paper Vladimirova et al. (2018)

    Bayesian neural network priors at the level of units

    Get PDF
    International audienceWe investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonlinearities, shedding light on novel sparsity-inducing mechanisms at the level of the units of the network. Bayesian neural networks with Gaussian priors are well known to induce the weight decay penalty on the weights. In contrast, our result indicates a more elaborate regularization scheme at the level of the units, ranging from convex penalties for the first two layers-L 2 regularization for the first and Lasso for the second-to non convex penalties for deeper layers. Thus, although weight decay does not allow for the weights to be set exactly to zero, sparse solutions tend to be selected for the units from the second layer onward. This result provides new theoretical insight on deep Bayesian neural networks, underpinning their natural shrinkage properties and practical potential

    Sparse Neural Network Training with In-Time Over-Parameterization

    Get PDF

    Sparse Neural Network Training with In-Time Over-Parameterization

    Get PDF

    Neural networks and quantum many-body physics: exploring reciprocal benefits.

    Get PDF
    One of the main reasons why the physics of quantum many-body systems is hard lies in the curse of dimensionality: The number of states of such systems increases exponentially with the number of degrees of freedom involved. As a result, computations for realistic systems become intractable, and even numerical methods are limited to comparably small system sizes. Many efforts in modern physics research are therefore concerned with finding efficient representations of quantum states and clever approximations schemes that would allow them to characterize physical systems of interest. Meanwhile, Deep Learning (DL) has solved many non-scientific problems that have been unaccessible to conventional methods for a similar reason. The concept underlying DL is to extract knowledge from data by identifying patterns and regularities. The remarkable success of DL has excited many physicists about the prospect of leveraging its power to solve intractable problems in physics. At the same time, DL turned out to be an interesting complex many-body problem in itself. In contrast to its widespread empirical applications, the theoretical foundation of DL is strongly underdeveloped. In particular, as long as its decision-making process and result interpretability remain opaque, DL can not claim the status of a scientific tool. In this thesis, I explore the interface between DL and quantum many-body physics, and investigate DL both as a tool and as a subject of study. The first project presented here is a theory-based study of a fundamental open question about the role of width and the number of parameters in deep neural networks. In this work, we consider a DL setup for the image recognition task on standard benchmarking datasets. We combine controlled experiments with a theoretical analysis, including analytical calculations for a toy model. The other three works focus on the application of Restricted Boltzmann Machines as generative models for the task of wavefunction reconstruction from measurement data on a quantum many-body system. First, we implement this approach as a software package, making it available as a tool for experimentalists. Following the idea that physics problems can be used to characterize DL tools, we then use our extensive knowledge of this setup to conduct a systematic study of how the RBM complexity scales with the complexity of the physical system. Finally, in a follow-up study we focus on the effects of parameter pruning techniques on the RBM and its scaling behavior
    • …
    corecore