486 research outputs found
Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality
In this note, we study how neural networks with a single hidden layer and
ReLU activation interpolate data drawn from a radially symmetric distribution
with target labels 1 at the origin and 0 outside the unit ball, if no labels
are known inside the unit ball. With weight decay regularization and in the
infinite neuron, infinite data limit, we prove that a unique radially symmetric
minimizer exists, whose weight decay regularizer and Lipschitz constant grow as
and respectively.
We furthermore show that the weight decay regularizer grows exponentially in
if the label is imposed on a ball of radius rather than
just at the origin. By comparison, a neural networks with two hidden layers can
approximate the target function without encountering the curse of
dimensionality.Comment: Main text 24 page
- …