Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum β2β norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors xiββRp
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
xiβ=Ξ£1/2ziβ (with ziββRp); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, xiβ=Ο(Wziβ) (with ziββRd,
WβRpΓd a matrix of i.i.d.\ entries, and Ο an
activation function acting componentwise on Wziβ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient