15 research outputs found
On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law Decay
The widely observed 'benign overfitting phenomenon' in the neural network
literature raises the challenge to the 'bias-variance trade-off' doctrine in
the statistical learning theory. Since the generalization ability of the 'lazy
trained' over-parametrized neural network can be well approximated by that of
the neural tangent kernel regression, the curve of the excess risk (namely, the
learning curve) of kernel ridge regression attracts increasing attention
recently. However, most recent arguments on the learning curve are heuristic
and are based on the 'Gaussian design' assumption. In this paper, under mild
and more realistic assumptions, we rigorously provide a full characterization
of the learning curve: elaborating the effect and the interplay of the choice
of the regularization parameter, the source condition and the noise. In
particular, our results suggest that the 'benign overfitting phenomenon' exists
in very wide neural networks only when the noise level is small