17,785 research outputs found
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
We study the necessary and sufficient complexity of ReLU neural networks---in
terms of depth and number of weights---which is required for approximating
classifier functions in . As a model class, we consider the set
of possibly discontinuous piecewise
functions , where the different smooth regions
of are separated by hypersurfaces. For dimension ,
regularity , and accuracy , we construct artificial
neural networks with ReLU activation function that approximate functions from
up to error of . The
constructed networks have a fixed number of layers, depending only on and
, and they have many nonzero weights,
which we prove to be optimal. In addition to the optimality in terms of the
number of weights, we show that in order to achieve the optimal approximation
rate, one needs ReLU networks of a certain depth. Precisely, for piecewise
functions, this minimal depth is given---up to a
multiplicative constant---by . Up to a log factor, our constructed
networks match this bound. This partly explains the benefits of depth for ReLU
networks by showing that deep networks are necessary to achieve efficient
approximation of (piecewise) smooth functions. Finally, we analyze
approximation in high-dimensional spaces where the function to be
approximated can be factorized into a smooth dimension reducing feature map
and classifier function ---defined on a low-dimensional feature
space---as . We show that in this case the approximation rate
depends only on the dimension of the feature space and not the input dimension.Comment: Generalized some estimates to norms for $0<p<\infty
On the Expressive Power of Neural Networks
In 1989 George Cybenko proved in a landmark paper that wide shallow neural
networks can approximate arbitrary continuous functions on a compact set. This
universal approximation theorem sparked a lot of follow-up research.
Shen, Yang and Zhang determined optimal approximation rates for ReLU-networks
in -norms with . Kidger and Lyons proved a universal
approximation theorem for deep narrow ReLU-networks. Telgarsky gave an example
of a deep narrow ReLU-network that cannot be approximated by a wide shallow
ReLU-network unless it has exponentially many neurons.
However, there are even more questions that still remain unresolved. Are
there any wide shallow ReLU-networks that cannot be approximated well by deep
narrow ReLU-networks? Is the universal approximation theorem still true for
other norms like the Sobolev norm ? Do these results hold for
activation functions other than ReLU?
We will answer all of those questions and more with a framework of two
expressive powers. The first one is well-known and counts the maximal number of
linear regions of a function calculated by a ReLU-network. We will improve the
best known bounds for this expressive power. The second one is entirely new.Comment: 54 page
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
We explore the ability of overparameterized shallow ReLU neural networks to
learn Lipschitz, non-differentiable, bounded functions with additive noise when
trained by Gradient Descent (GD). To avoid the problem that in the presence of
noise, neural networks trained to nearly zero training error are inconsistent
in this class, we focus on the early-stopped GD which allows us to show
consistency and optimal rates. In particular, we explore this problem from the
viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained
finite-width neural network. We show that whenever some early stopping rule is
guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the
kernel induced by the ReLU activation function, the same rule can be used to
achieve minimax optimal rate for learning on the class of considered Lipschitz
functions by neural networks. We discuss several data-free and data-dependent
practically appealing stopping rules that yield optimal rates
Approximation in shift-invariant spaces with deep ReLU neural networks
We study the expressive power of deep ReLU neural networks for approximating
functions in dilated shift-invariant spaces, which are widely used in signal
processing, image processing, communications and so on. Approximation error
bounds are estimated with respect to the width and depth of neural networks.
The network construction is based on the bit extraction and data-fitting
capacity of deep neural networks. As applications of our main results, the
approximation rates of classical function spaces such as Sobolev spaces and
Besov spaces are obtained. We also give lower bounds of the approximation error for Sobolev spaces, which show that our
construction of neural network is asymptotically optimal up to a logarithmic
factor
Robust nonparametric regression based on deep ReLU neural networks
In this paper, we consider robust nonparametric regression using deep neural
networks with ReLU activation function. While several existing theoretically
justified methods are geared towards robustness against identical heavy-tailed
noise distributions, the rise of adversarial attacks has emphasized the
importance of safeguarding estimation procedures against systematic
contamination. We approach this statistical issue by shifting our focus towards
estimating conditional distributions. To address it robustly, we introduce a
novel estimation procedure based on -estimation. Under a mild model
assumption, we establish general non-asymptotic risk bounds for the resulting
estimators, showcasing their robustness against contamination, outliers, and
model misspecification. We then delve into the application of our approach
using deep ReLU neural networks. When the model is well-specified and the
regression function belongs to an -H\"older class, employing
-type estimation on suitable networks enables the resulting estimators to
achieve the minimax optimal rate of convergence. Additionally, we demonstrate
that deep -type estimators can circumvent the curse of dimensionality by
assuming the regression function closely resembles the composition of several
H\"older functions. To attain this, new deep fully-connected ReLU neural
networks have been designed to approximate this composition class. This
approximation result can be of independent interest.Comment: 40 page
- …