Optimal Rate of Kernel Regression in Large Dimensions

Abstract

We perform a study on kernel regression for large-dimensional data (where the sample size nn is polynomially depending on the dimension dd of the samples, i.e., ndγn\asymp d^{\gamma} for some γ>0\gamma >0 ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity εn2\varepsilon_{n}^{2} and the metric entropy εˉn2\bar{\varepsilon}_{n}^{2} respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on Sd\mathbb{S}^{d}, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is n1/2n^{-1/2} when ndγn\asymp d^{\gamma} for γ=2,4,6,8,\gamma =2, 4, 6, 8, \cdots. We then further determine the optimal rate of the excess risk of kernel regression for all the γ>0\gamma>0 and find that the curve of optimal rate varying along γ\gamma exhibits several new phenomena including the {\it multiple descent behavior} and the {\it periodic plateau behavior}. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well

    Similar works

    Full text

    thumbnail-image

    Available Versions