Near-Interpolators: Rapid Norm Growth and the Trade-Off between Interpolation and Generalization

Abstract

We study the generalization capability of nearly-interpolating linear regressors: β\boldsymbol{\beta}'s whose training error τ\tau is positive but small, i.e., below the noise floor. Under a random matrix theoretic assumption on the data distribution and an eigendecay assumption on the data covariance matrix Σ\boldsymbol{\Sigma}, we demonstrate that any near-interpolator exhibits rapid norm growth: for τ\tau fixed, β\boldsymbol{\beta} has squared ℓ2\ell_2-norm E[∥β∥22]=Ω(nα)\mathbb{E}[\|{\boldsymbol{\beta}}\|_{2}^{2}] = \Omega(n^{\alpha}) where nn is the number of samples and α>1\alpha >1 is the exponent of the eigendecay, i.e., λi(Σ)∼i−α\lambda_i(\boldsymbol{\Sigma}) \sim i^{-\alpha}. This implies that existing data-independent norm-based bounds are necessarily loose. On the other hand, in the same regime we precisely characterize the asymptotic trade-off between interpolation and generalization. Our characterization reveals that larger norm scaling exponents α\alpha correspond to worse trade-offs between interpolation and generalization. We verify empirically that a similar phenomenon holds for nearly-interpolating shallow neural networks.Comment: AISTATS 202

    Similar works

    Full text

    thumbnail-image

    Available Versions