Search CORE

486 research outputs found

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

Author: Wojtowytsch Stephan
Publication venue
Publication date: 02/09/2022
Field of study

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as

d

and

\sqrt{d}

respectively. We furthermore show that the weight decay regularizer grows exponentially in

d

if the label

1

is imposed on a ball of radius

\varepsilon

rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.Comment: Main text 24 page

arXiv.org e-Print Archive