Search CORE

17,785 research outputs found

Optimal approximation of piecewise smooth functions using deep ReLU neural networks

Author: Petersen Philipp
Voigtlaender Felix
Publication venue
Publication date: 22/05/2018
Field of study

We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and number of weights---which is required for approximating classifier functions in

L^2

. As a model class, we consider the set

\mathcal{E}^\beta (\mathbb R^d)

of possibly discontinuous piecewise

C^\beta

functions

f : [-1/2, 1/2]^d \to \mathbb R

, where the different smooth regions of

f

are separated by

C^\beta

hypersurfaces. For dimension

d \geq 2

, regularity

\beta > 0

, and accuracy

\varepsilon > 0

, we construct artificial neural networks with ReLU activation function that approximate functions from

\mathcal{E}^\beta(\mathbb R^d)

up to

L^2

error of

\varepsilon

. The constructed networks have a fixed number of layers, depending only on

d

and

\beta

, and they have

O(\varepsilon^{-2(d-1)/\beta})

many nonzero weights, which we prove to be optimal. In addition to the optimality in terms of the number of weights, we show that in order to achieve the optimal approximation rate, one needs ReLU networks of a certain depth. Precisely, for piecewise

C^\beta(\mathbb R^d)

functions, this minimal depth is given---up to a multiplicative constant---by

\beta/d

. Up to a log factor, our constructed networks match this bound. This partly explains the benefits of depth for ReLU networks by showing that deep networks are necessary to achieve efficient approximation of (piecewise) smooth functions. Finally, we analyze approximation in high-dimensional spaces where the function

f

to be approximated can be factorized into a smooth dimension reducing feature map

\tau

and classifier function

g

---defined on a low-dimensional feature space---as

f = g \circ \tau

. We show that in this case the approximation rate depends only on the dimension of the feature space and not the input dimension.Comment: Generalized some estimates to

L^p

norms for $0<p<\infty

arXiv.org e-Print Archive

Publikationsserver der Katholischen Universität Eichstätt-Ingolstadt

On the Expressive Power of Neural Networks

Author: Holstermann Jan
Publication venue
Publication date: 31/05/2023
Field of study

In 1989 George Cybenko proved in a landmark paper that wide shallow neural networks can approximate arbitrary continuous functions on a compact set. This universal approximation theorem sparked a lot of follow-up research. Shen, Yang and Zhang determined optimal approximation rates for ReLU-networks in

L^p

-norms with

p \in [1,\infty)

. Kidger and Lyons proved a universal approximation theorem for deep narrow ReLU-networks. Telgarsky gave an example of a deep narrow ReLU-network that cannot be approximated by a wide shallow ReLU-network unless it has exponentially many neurons. However, there are even more questions that still remain unresolved. Are there any wide shallow ReLU-networks that cannot be approximated well by deep narrow ReLU-networks? Is the universal approximation theorem still true for other norms like the Sobolev norm

W^{1,1}

? Do these results hold for activation functions other than ReLU? We will answer all of those questions and more with a framework of two expressive powers. The first one is well-known and counts the maximal number of linear regions of a function calculated by a ReLU-network. We will improve the best known bounds for this expressive power. The second one is entirely new.Comment: 54 page

arXiv.org e-Print Archive

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

Author: Kuzborskij Ilja
Szepesvári Csaba
Publication venue
Publication date: 28/12/2022
Field of study

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, non-differentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss several data-free and data-dependent practically appealing stopping rules that yield optimal rates

arXiv.org e-Print Archive

Approximation in shift-invariant spaces with deep ReLU neural networks

Author: Li Zhen
Wang Yang
Yang Yunfei
Publication venue
Publication date: 21/06/2021
Field of study

We study the expressive power of deep ReLU neural networks for approximating functions in dilated shift-invariant spaces, which are widely used in signal processing, image processing, communications and so on. Approximation error bounds are estimated with respect to the width and depth of neural networks. The network construction is based on the bit extraction and data-fitting capacity of deep neural networks. As applications of our main results, the approximation rates of classical function spaces such as Sobolev spaces and Besov spaces are obtained. We also give lower bounds of the

L^p (1\le p \le \infty)

approximation error for Sobolev spaces, which show that our construction of neural network is asymptotically optimal up to a logarithmic factor

arXiv.org e-Print Archive

Robust nonparametric regression based on deep ReLU neural networks

Author: Chen Juntong
Publication venue
Publication date: 31/10/2023
Field of study

In this paper, we consider robust nonparametric regression using deep neural networks with ReLU activation function. While several existing theoretically justified methods are geared towards robustness against identical heavy-tailed noise distributions, the rise of adversarial attacks has emphasized the importance of safeguarding estimation procedures against systematic contamination. We approach this statistical issue by shifting our focus towards estimating conditional distributions. To address it robustly, we introduce a novel estimation procedure based on

\ell

-estimation. Under a mild model assumption, we establish general non-asymptotic risk bounds for the resulting estimators, showcasing their robustness against contamination, outliers, and model misspecification. We then delve into the application of our approach using deep ReLU neural networks. When the model is well-specified and the regression function belongs to an

\alpha

-H\"older class, employing

\ell

-type estimation on suitable networks enables the resulting estimators to achieve the minimax optimal rate of convergence. Additionally, we demonstrate that deep

\ell

-type estimators can circumvent the curse of dimensionality by assuming the regression function closely resembles the composition of several H\"older functions. To attain this, new deep fully-connected ReLU neural networks have been designed to approximate this composition class. This approximation result can be of independent interest.Comment: 40 page

arXiv.org e-Print Archive