2,608 research outputs found
The universal approximation power of finite-width deep ReLU networks
We show that finite-width deep ReLU neural networks yield rate-distortion
optimal approximation (B\"olcskei et al., 2018) of polynomials, windowed
sinusoidal functions, one-dimensional oscillatory textures, and the Weierstrass
function, a fractal function which is continuous but nowhere differentiable.
Together with their recently established universal approximation property of
affine function systems (B\"olcskei et al., 2018), this shows that deep neural
networks approximate vastly different signal structures generated by the affine
group, the Weyl-Heisenberg group, or through warping, and even certain
fractals, all with approximation error decaying exponentially in the number of
neurons. We also prove that in the approximation of sufficiently smooth
functions finite-width deep networks require strictly smaller connectivity than
finite-depth wide networks
ResNet with one-neuron hidden layers is a Universal Approximator
We demonstrate that a very deep ResNet with stacked modules with one neuron
per hidden layer and ReLU activation functions can uniformly approximate any
Lebesgue integrable function in dimensions, i.e. .
Because of the identity mapping inherent to ResNets, our network has
alternating layers of dimension one and . This stands in sharp contrast to
fully connected networks, which are not universal approximators if their width
is the input dimension [Lu et al, 2017; Hanin and Sellke, 2017]. Hence, our
result implies an increase in representational power for narrow deep networks
by the ResNet architecture
The Expressive Power of Neural Networks: A View from the Width
The expressive power of neural networks is important for understanding deep
learning. Most existing works consider this problem from the view of the depth
of a network. In this paper, we study how width affects the expressiveness of
neural networks. Classical results state that depth-bounded (e.g. depth-)
networks with suitable activation functions are universal approximators. We
show a universal approximation theorem for width-bounded ReLU networks:
width- ReLU networks, where is the input dimension, are universal
approximators. Moreover, except for a measure zero set, all functions cannot be
approximated by width- ReLU networks, which exhibits a phase transition.
Several recent works demonstrate the benefits of depth by proving the
depth-efficiency of neural networks. That is, there are classes of deep
networks which cannot be realized by any shallow network whose size is no more
than an exponential bound. Here we pose the dual question on the
width-efficiency of ReLU networks: Are there wide networks that cannot be
realized by narrow networks whose size is not substantially larger? We show
that there exist classes of wide networks which cannot be realized by any
narrow network whose depth is no more than a polynomial bound. On the other
hand, we demonstrate by extensive experiments that narrow networks whose size
exceed the polynomial bound by a constant factor can approximate wide and
shallow network with high accuracy. Our results provide more comprehensive
evidence that depth is more effective than width for the expressiveness of ReLU
networks.Comment: accepted by NIPS 2017 ( with some typos fixed
Deep Semi-Random Features for Nonlinear Function Approximation
We propose semi-random features for nonlinear function approximation. The
flexibility of semi-random feature lies between the fully adjustable units in
deep learning and the random features used in kernel methods. For one hidden
layer models with semi-random features, we prove with no unrealistic
assumptions that the model classes contain an arbitrarily good function as the
width increases (universality), and despite non-convexity, we can find such a
good function (optimization theory) that generalizes to unseen new data
(generalization bound). For deep models, with no unrealistic assumptions, we
prove universal approximation ability, a lower bound on approximation error, a
partial optimization guarantee, and a generalization bound. Depending on the
problems, the generalization bound of deep semi-random features can be
exponentially better than the known bounds of deep ReLU nets; our
generalization error bound can be independent of the depth, the number of
trainable weights as well as the input dimensionality. In experiments, we show
that semi-random features can match the performance of neural networks by using
slightly more units, and it outperforms random features by using significantly
fewer units. Moreover, we introduce a new implicit ensemble method by using
semi-random features.Comment: AAAI 2018 - Extended versio
ReLU Deep Neural Networks and Linear Finite Elements
In this paper, we investigate the relationship between deep neural networks
(DNN) with rectified linear unit (ReLU) function as the activation function and
continuous piecewise linear (CPWL) functions, especially CPWL functions from
the simplicial linear finite element method (FEM). We first consider the
special case of FEM. By exploring the DNN representation of its nodal basis
functions, we present a ReLU DNN representation of CPWL in FEM. We
theoretically establish that at least hidden layers are needed in a ReLU
DNN to represent any linear finite element functions in when . Consequently, for which are often
encountered in scientific and engineering computing, the minimal number of two
hidden layers are necessary and sufficient for any CPWL function to be
represented by a ReLU DNN. Then we include a detailed account on how a general
CPWL in can be represented by a ReLU DNN with at most
hidden layers and we also give an estimation of the
number of neurons in DNN that are needed in such a representation. Furthermore,
using the relationship between DNN and FEM, we theoretically argue that a
special class of DNN models with low bit-width are still expected to have an
adequate representation power in applications. Finally, as a proof of concept,
we present some numerical results for using ReLU DNNs to solve a two point
boundary problem to demonstrate the potential of applying DNN for numerical
solution of partial differential equations
Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions
We prove that for analytic functions in low dimension, the convergence rate
of the deep neural network approximation is exponential
Function approximation by deep networks
We show that deep networks are better than shallow networks at approximating
functions that can be expressed as a composition of functions described by a
directed acyclic graph, because the deep networks can be designed to have the
same compositional structure, while a shallow network cannot exploit this
knowledge. Thus, the blessing of compositionality mitigates the curse of
dimensionality. On the other hand, a theorem called good propagation of errors
allows to `lift' theorems about shallow networks to those about deep networks
with an appropriate choice of norms, smoothness, etc. We illustrate this in
three contexts where each channel in the deep network calculates a spherical
polynomial, a non-smooth ReLU network, or another zonal function network
related closely with the ReLU network.Comment: To appear in Communications in pure and applied mathematic
Slim, Sparse, and Shortcut Networks
Over the recent years, deep learning has become the mainstream data-driven
approach to solve many real-world problems in many important areas. Among the
successful network architectures, shortcut connections are well established to
take the outputs of earlier layers as additional inputs to later layers, which
have produced excellent results. Despite the extraordinary power, there remain
important questions on the underlying mechanism and associated functionalities
regarding shortcuts. For example, why are the shortcuts powerful? How to tune
the shortcut topology to optimize the efficiency and capacity of the network
model? Along this direction, here we first demonstrate a topology of shortcut
connections that can make a one-neuron-wide deep network approximate any
univariate function. Then, we present a novel width-bounded universal
approximator in contrast to depth-bounded universal approximators. Next we
demonstrate a family of theoretically equivalent networks, corroborated by the
concerning statistical significance experiments, and their graph spectral
characterization, thereby associating the representation ability of neural
network with their graph spectral properties. Furthermore, we shed light on the
effect of concatenation shortcuts on the margin-based multi-class
generalization bound of deep networks. Encouraged by the positive results from
the bounds analysis, we instantiate a slim, sparse, and shortcut network
(S3-Net) and the experimental results demonstrate that the S3-Net can achieve
better learning performance than the densely connected networks and other
state-of-the-art models on some well-known benchmarks
Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations
This article concerns the expressive power of depth in neural nets with ReLU
activations and bounded width. We are particularly interested in the following
questions: what is the minimal width so that ReLU nets of
width (and arbitrary depth) can approximate any continuous
function on the unit cube aribitrarily well? For ReLU nets near this
minimal width, what can one say about the depth necessary to approximate a
given function? Our approach to this paper is based on the observation that,
due to the convexity of the ReLU activation, ReLU nets are particularly
well-suited for representing convex functions. In particular, we prove that
ReLU nets with width can approximate any continuous convex function of
variables arbitrarily well. These results then give quantitative depth
estimates for the rate of approximation of any continuous scalar function on
the -dimensional cube by ReLU nets with width Comment: v3. Theorem 3 removed. Comments Welcome. 9
Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions
In the recent literature the important role of depth in deep learning has
been emphasized. In this paper we argue that sufficient width of a feedforward
network is equally important by answering the simple question under which
conditions the decision regions of a neural network are connected. It turns out
that for a class of activation functions including leaky ReLU, neural networks
having a pyramidal structure, that is no layer has more hidden units than the
input dimension, produce necessarily connected decision regions. This implies
that a sufficiently wide hidden layer is necessary to guarantee that the
network can produce disconnected decision regions. We discuss the implications
of this result for the construction of neural networks, in particular the
relation to the problem of adversarial manipulation of classifiers.Comment: Accepted at ICML 2018. Added discussion for non-pyramidal networks
and ReLU activation functio
- …