12 research outputs found
Harmless Overparametrization in Two-layer Neural Networks
Overparametrized neural networks, where the number of active parameters is
larger than the sample size, prove remarkably effective in modern deep learning
practice. From the classical perspective, however, much fewer parameters are
sufficient for optimal estimation and prediction, whereas overparametrization
can be harmful even in the presence of explicit regularization. To reconcile
this conflict, we present a generalization theory for overparametrized ReLU
networks by incorporating an explicit regularizer based on the scaled variation
norm. Interestingly, this regularizer is equivalent to the ridge from the angle
of gradient-based optimization, but is similar to the group lasso in terms of
controlling model complexity. By exploiting this ridge-lasso duality, we show
that overparametrization is generally harmless to two-layer ReLU networks. In
particular, the overparametrized estimators are minimax optimal up to a
logarithmic factor. By contrast, we show that overparametrized random feature
models suffer from the curse of dimensionality and thus are suboptimal