2,608 research outputs found

    The universal approximation power of finite-width deep ReLU networks

    Full text link
    We show that finite-width deep ReLU neural networks yield rate-distortion optimal approximation (B\"olcskei et al., 2018) of polynomials, windowed sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass function, a fractal function which is continuous but nowhere differentiable. Together with their recently established universal approximation property of affine function systems (B\"olcskei et al., 2018), this shows that deep neural networks approximate vastly different signal structures generated by the affine group, the Weyl-Heisenberg group, or through warping, and even certain fractals, all with approximation error decaying exponentially in the number of neurons. We also prove that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks

    ResNet with one-neuron hidden layers is a Universal Approximator

    Full text link
    We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in dd dimensions, i.e. 1(Rd)\ell_1(\mathbb{R}^d). Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and dd. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension dd [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture

    The Expressive Power of Neural Networks: A View from the Width

    Full text link
    The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-22) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-(n+4)(n+4) ReLU networks, where nn is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-nn ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.Comment: accepted by NIPS 2017 ( with some typos fixed

    Deep Semi-Random Features for Nonlinear Function Approximation

    Full text link
    We propose semi-random features for nonlinear function approximation. The flexibility of semi-random feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (universality), and despite non-convexity, we can find such a good function (optimization theory) that generalizes to unseen new data (generalization bound). For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee, and a generalization bound. Depending on the problems, the generalization bound of deep semi-random features can be exponentially better than the known bounds of deep ReLU nets; our generalization error bound can be independent of the depth, the number of trainable weights as well as the input dimensionality. In experiments, we show that semi-random features can match the performance of neural networks by using slightly more units, and it outperforms random features by using significantly fewer units. Moreover, we introduce a new implicit ensemble method by using semi-random features.Comment: AAAI 2018 - Extended versio

    ReLU Deep Neural Networks and Linear Finite Elements

    Full text link
    In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least 22 hidden layers are needed in a ReLU DNN to represent any linear finite element functions in ΩRd\Omega \subseteq \mathbb{R}^d when d2d\ge2. Consequently, for d=2,3d=2,3 which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in Rd\mathbb R^d can be represented by a ReLU DNN with at most log2(d+1)\lceil\log_2(d+1)\rceil hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations

    Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions

    Full text link
    We prove that for analytic functions in low dimension, the convergence rate of the deep neural network approximation is exponential

    Function approximation by deep networks

    Full text link
    We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift' theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.Comment: To appear in Communications in pure and applied mathematic

    Slim, Sparse, and Shortcut Networks

    Full text link
    Over the recent years, deep learning has become the mainstream data-driven approach to solve many real-world problems in many important areas. Among the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary power, there remain important questions on the underlying mechanism and associated functionalities regarding shortcuts. For example, why are the shortcuts powerful? How to tune the shortcut topology to optimize the efficiency and capacity of the network model? Along this direction, here we first demonstrate a topology of shortcut connections that can make a one-neuron-wide deep network approximate any univariate function. Then, we present a novel width-bounded universal approximator in contrast to depth-bounded universal approximators. Next we demonstrate a family of theoretically equivalent networks, corroborated by the concerning statistical significance experiments, and their graph spectral characterization, thereby associating the representation ability of neural network with their graph spectral properties. Furthermore, we shed light on the effect of concatenation shortcuts on the margin-based multi-class generalization bound of deep networks. Encouraged by the positive results from the bounds analysis, we instantiate a slim, sparse, and shortcut network (S3-Net) and the experimental results demonstrate that the S3-Net can achieve better learning performance than the densely connected networks and other state-of-the-art models on some well-known benchmarks

    Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

    Full text link
    This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width wmin(d)w_{\text{min}}(d) so that ReLU nets of width wmin(d)w_{\text{min}}(d) (and arbitrary depth) can approximate any continuous function on the unit cube [0,1]d[0,1]^d aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions. In particular, we prove that ReLU nets with width d+1d+1 can approximate any continuous convex function of dd variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the dd-dimensional cube [0,1]d[0,1]^d by ReLU nets with width d+3.d+3.Comment: v3. Theorem 3 removed. Comments Welcome. 9

    Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

    Full text link
    In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide hidden layer is necessary to guarantee that the network can produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.Comment: Accepted at ICML 2018. Added discussion for non-pyramidal networks and ReLU activation functio
    corecore