Search CORE

2,608 research outputs found

The universal approximation power of finite-width deep ReLU networks

Author: Bölcskei Helmut
Elbrächter Dennis
Grohs Philipp
Perekrestenko Dmytro
Publication venue
Publication date: 05/06/2018
Field of study

We show that finite-width deep ReLU neural networks yield rate-distortion optimal approximation (B\"olcskei et al., 2018) of polynomials, windowed sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass function, a fractal function which is continuous but nowhere differentiable. Together with their recently established universal approximation property of affine function systems (B\"olcskei et al., 2018), this shows that deep neural networks approximate vastly different signal structures generated by the affine group, the Weyl-Heisenberg group, or through warping, and even certain fractals, all with approximation error decaying exponentially in the number of neurons. We also prove that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks

arXiv.org e-Print Archive

ResNet with one-neuron hidden layers is a Universal Approximator

Author: Jegelka Stefanie
Lin Hongzhou
Publication venue
Publication date: 04/07/2018
Field of study

We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in

d

dimensions, i.e.

\ell_1(\mathbb{R}^d)

. Because of the identity mapping inherent to ResNets, our network has alternating layers of dimension one and

d

. This stands in sharp contrast to fully connected networks, which are not universal approximators if their width is the input dimension

d

[Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our result implies an increase in representational power for narrow deep networks by the ResNet architecture

arXiv.org e-Print Archive

The Expressive Power of Neural Networks: A View from the Width

Author: Hu Zhiqiang
Lu Zhou
Pu Hongming
Wang Feicheng
Wang Liwei
Publication venue
Publication date: 01/11/2017
Field of study

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-

2

) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-

(n+4)

ReLU networks, where

n

is the input dimension, are universal approximators. Moreover, except for a measure zero set, all functions cannot be approximated by width-

n

ReLU networks, which exhibits a phase transition. Several recent works demonstrate the benefits of depth by proving the depth-efficiency of neural networks. That is, there are classes of deep networks which cannot be realized by any shallow network whose size is no more than an exponential bound. Here we pose the dual question on the width-efficiency of ReLU networks: Are there wide networks that cannot be realized by narrow networks whose size is not substantially larger? We show that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound. On the other hand, we demonstrate by extensive experiments that narrow networks whose size exceed the polynomial bound by a constant factor can approximate wide and shallow network with high accuracy. Our results provide more comprehensive evidence that depth is more effective than width for the expressiveness of ReLU networks.Comment: accepted by NIPS 2017 ( with some typos fixed

arXiv.org e-Print Archive

Deep Semi-Random Features for Nonlinear Function Approximation

Author: Kawaguchi Kenji
Song Le
Verma Vikas
Xie Bo
Publication venue
Publication date: 20/11/2017
Field of study

We propose semi-random features for nonlinear function approximation. The flexibility of semi-random feature lies between the fully adjustable units in deep learning and the random features used in kernel methods. For one hidden layer models with semi-random features, we prove with no unrealistic assumptions that the model classes contain an arbitrarily good function as the width increases (universality), and despite non-convexity, we can find such a good function (optimization theory) that generalizes to unseen new data (generalization bound). For deep models, with no unrealistic assumptions, we prove universal approximation ability, a lower bound on approximation error, a partial optimization guarantee, and a generalization bound. Depending on the problems, the generalization bound of deep semi-random features can be exponentially better than the known bounds of deep ReLU nets; our generalization error bound can be independent of the depth, the number of trainable weights as well as the input dimensionality. In experiments, we show that semi-random features can match the performance of neural networks by using slightly more units, and it outperforms random features by using significantly fewer units. Moreover, we introduce a new implicit ensemble method by using semi-random features.Comment: AAAI 2018 - Extended versio

arXiv.org e-Print Archive

ReLU Deep Neural Networks and Linear Finite Elements

Author: He Juncai
Li Lin
Xu Jinchao
Zheng Chunyue
Publication venue: 'Global Science Press'
Publication date: 25/07/2018
Field of study

In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least

2

hidden layers are needed in a ReLU DNN to represent any linear finite element functions in

\Omega \subseteq \mathbb{R}^d

when

d\ge2

. Consequently, for

d=2,3

which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in

\mathbb R^d

can be represented by a ReLU DNN with at most

\lceil\log_2(d+1)\rceil

hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations

arXiv.org e-Print Archive

Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions

Author: E Weinan
Wang Qingcan
Publication venue
Publication date: 01/07/2018
Field of study

We prove that for analytic functions in low dimension, the convergence rate of the deep neural network approximation is exponential

arXiv.org e-Print Archive

Function approximation by deep networks

Author: Mhaskar H. N.
Poggio T.
Publication venue
Publication date: 23/11/2019
Field of study

We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift' theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.Comment: To appear in Communications in pure and applied mathematic

arXiv.org e-Print Archive

Slim, Sparse, and Shortcut Networks

Author: Fan Fenglei
Guo Hengtao
Wang Dayang
Wang Ge
Yan Pingkun
Yu Hengyong
Zhu Qikui
Publication venue
Publication date: 27/04/2020
Field of study

Over the recent years, deep learning has become the mainstream data-driven approach to solve many real-world problems in many important areas. Among the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary power, there remain important questions on the underlying mechanism and associated functionalities regarding shortcuts. For example, why are the shortcuts powerful? How to tune the shortcut topology to optimize the efficiency and capacity of the network model? Along this direction, here we first demonstrate a topology of shortcut connections that can make a one-neuron-wide deep network approximate any univariate function. Then, we present a novel width-bounded universal approximator in contrast to depth-bounded universal approximators. Next we demonstrate a family of theoretically equivalent networks, corroborated by the concerning statistical significance experiments, and their graph spectral characterization, thereby associating the representation ability of neural network with their graph spectral properties. Furthermore, we shed light on the effect of concatenation shortcuts on the margin-based multi-class generalization bound of deep networks. Encouraged by the positive results from the bounds analysis, we instantiate a slim, sparse, and shortcut network (S3-Net) and the experimental results demonstrate that the S3-Net can achieve better learning performance than the densely connected networks and other state-of-the-art models on some well-known benchmarks

arXiv.org e-Print Archive

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

Author: Hanin Boris
Publication venue: 'MDPI AG'
Publication date: 20/12/2017
Field of study

This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width

w_{\text{min}}(d)

so that ReLU nets of width

w_{\text{min}}(d)

(and arbitrary depth) can approximate any continuous function on the unit cube

[0,1]^d

aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given function? Our approach to this paper is based on the observation that, due to the convexity of the ReLU activation, ReLU nets are particularly well-suited for representing convex functions. In particular, we prove that ReLU nets with width

d+1

can approximate any continuous convex function of

d

variables arbitrarily well. These results then give quantitative depth estimates for the rate of approximation of any continuous scalar function on the

d

-dimensional cube

[0,1]^d

by ReLU nets with width

d+3.

Comment: v3. Theorem 3 removed. Comments Welcome. 9

arXiv.org e-Print Archive

Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

Author: Hein Matthias
Mukkamala Mahesh Chandra
Nguyen Quynh
Publication venue
Publication date: 08/06/2018
Field of study

In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide hidden layer is necessary to guarantee that the network can produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.Comment: Accepted at ICML 2018. Added discussion for non-pyramidal networks and ReLU activation functio

arXiv.org e-Print Archive