Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Hastie, Trevor; Montanari, Andrea; Rosset, Saharon; Tibshirani, Ryan J.

research

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Authors: Trevor Hastie
Andrea Montanari
Saharon Rosset
Ryan J. Tibshirani
Publication date: 7 December 2020
Publisher

Abstract

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum

\ell_2

norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors

x_i \in {\mathbb R}^p

are obtained by applying a linear transform to a vector of i.i.d.\ entries,

x_i = \Sigma^{1/2} z_i

(with

z_i \in {\mathbb R}^p

); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network,

x_i = \varphi(W z_i)

(with

z_i \in {\mathbb R}^d

,

W \in {\mathbb R}^{p \times d}

a matrix of i.i.d.\ entries, and

\varphi

an activation function acting componentwise on

W z_i

). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:1903.08560

Last time updated on 02/06/2019