Statistical Optimality of Deep Wide Neural Networks

Chen, Guhan; Li, Yicheng; Lin, Qian; Yu, Zixiong

Statistical Optimality of Deep Wide Neural Networks

Authors: Guhan Chen
Yicheng Li
Qian Lin
Zixiong Yu
Publication date: 27 June 2023
Publisher

Abstract

In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain

\mathcal X \subset \mathbb R^{d}

. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on

\mathcal{X}

and its eigenvalue decay rate is

(d+1)/d

. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on

\mathbb S^{d}

. We believe our technical contributions in determining the eigenvalue decay rate of NTK on

\mathbb R^{d}

might be of independent interests

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.02657

Last time updated on 06/05/2023