249 research outputs found

    Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

    Full text link
    We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l1 penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework

    Generalization Error in Deep Learning

    Get PDF
    Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

    Connections Between Numerical Algorithms for PDEs and Neural Networks

    Get PDF
    We investigate numerous structural connections between numerical algorithms for partial differential equations (PDEs) and neural architectures. Our goal is to transfer the rich set of mathematical foundations from the world of PDEs to neural networks. Besides structural insights, we provide concrete examples and experimental evaluations of the resulting architectures. Using the example of generalised nonlinear diffusion in 1D, we consider explicit schemes, acceleration strategies thereof, implicit schemes, and multigrid approaches. We connect these concepts to residual networks, recurrent neural networks, and U-net architectures. Our findings inspire a symmetric residual network design with provable stability guarantees and justify the effectiveness of skip connections in neural networks from a numerical perspective. Moreover, we present U-net architectures that implement multigrid techniques for learning efficient solutions of partial differential equation models, and motivate uncommon design choices such as trainable nonmonotone activation functions. Experimental evaluations show that the proposed architectures save half of the trainable parameters and can thus outperform standard ones with the same model complexity. Our considerations serve as a basis for explaining the success of popular neural architectures and provide a blueprint for developing new mathematically well-founded neural building blocks

    Lifted Regression/Reconstruction Networks

    Get PDF
    In this work we propose lifted regression/reconstruction networks(LRRNs), which combine lifted neural networks with a guaranteed Lipschitz continuity property for the output layer. Lifted neural networks explicitly optimize an energy model to infer the unit activations and therefore—in contrast to standard feed-forward neural networks—allow bidirectional feedback between layers. So far lifted neural networks have been modelled around standard feed-forward architectures. We propose to take further advantage of the feedback property by letting the layers simultaneously perform regression and reconstruction. The resulting lifted network architecture allows to control the desired amount of Lipschitz continuity, which is an important feature to obtain adversarially robust regression and classification methods. We analyse and numerically demonstrate applications for unsupervised and supervised learnin

    Lifted Regression/Reconstruction Networks

    Get PDF
    In this work we propose lifted regression/reconstruction networks (LRRNs), which combine lifted neural networks with a guaranteed Lipschitz continuity property for the output layer. Lifted neural networks explicitly optimize an energy model to infer the unit activations and therefore---in contrast to standard feed-forward neural networks---allow bidirectional feedback between layers. So far lifted neural networks have been modelled around standard feed-forward architectures. We propose to take further advantage of the feedback property by letting the layers simultaneously perform regression and reconstruction. The resulting lifted network architecture allows to control the desired amount of Lipschitz continuity, which is an important feature to obtain adversarially robust regression and classification methods. We analyse and numerically demonstrate applications for unsupervised and supervised learning.Comment: 12 pages, 8 figure

    Global Convergence of SGD For Logistic Loss on Two Layer Neural Nets

    Full text link
    In this note, we demonstrate a first-of-its-kind provable convergence of SGD to the global minima of appropriately regularized logistic empirical risk of depth 22 nets -- for arbitrary data and with any number of gates with adequately smooth and bounded activations like sigmoid and tanh. We also prove an exponentially fast convergence rate for continuous time SGD that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence of Frobenius norm regularized logistic loss functions on constant-sized neural nets which are "Villani functions" and thus be able to build on recent progress with analyzing SGD on such objectives.Comment: 18 Pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2210.1145

    NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations

    Get PDF
    This paper introduces Non-Autonomous Input-Output Stable Network (NAIS-Net), a very deep architecture where each stacked processing block is derived from a time-invariant non-autonomous dynamical system. Non-autonomy is implemented by skip connections from the block input to each of the unrolled processing stages and allows stability to be enforced so that blocks can be unrolled adaptively to a pattern-dependent processing depth. NAIS-Net induces non-trivial, Lipschitz input-output maps, even for an infinite unroll length. We prove that the network is globally asymptotically stable so that for every initial condition there is exactly one input-dependent equilibrium assuming tanh units, and multiple stable equilibria for ReL units. An efficient implementation that enforces the stability under derived conditions for both fully-connected and convolutional layers is also presented. Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to ResNets.Comment: NIPS 201
    • …
    corecore