Search CORE

8 research outputs found

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Author: Hastie Trevor
Montanari Andrea
Rosset Saharon
Tibshirani Ryan J.
Publication venue
Publication date: 07/12/2020
Field of study

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum

\ell_2

norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors

x_i \in {\mathbb R}^p

are obtained by applying a linear transform to a vector of i.i.d.\ entries,

x_i = \Sigma^{1/2} z_i

(with

z_i \in {\mathbb R}^p

); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network,

x_i = \varphi(W z_i)

(with

z_i \in {\mathbb R}^d

W \in {\mathbb R}^{p \times d}

a matrix of i.i.d.\ entries, and

\varphi

an activation function acting componentwise on

W z_i

). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

arXiv.org e-Print Archive

Out-of-sample error estimate for robust M-estimators with convex penalty

Author: Bellec Pierre C
Publication venue
Publication date: 03/06/2021
Field of study

A generic out-of-sample error estimate is proposed for robust

M

-estimators regularized with a convex penalty in high-dimensional linear regression where

(X,y)

is observed and

p,n

are of the same order. If

\psi

is the derivative of the robust data-fitting loss

\rho

, the estimate depends on the observed data only through the quantities

\hat\psi = \psi(y-X\hat\beta)

X^\top \hat\psi

and the derivatives

(\partial/\partial y) \hat\psi

and

(\partial/\partial y) X\hat\beta

for fixed

X

. The out-of-sample error estimate enjoys a relative error of order

n^{-1/2}

in a linear model with Gaussian covariates and independent noise, either non-asymptotically when

p/n\le \gamma

or asymptotically in the high-dimensional asymptotic regime

p/n\to\gamma'\in(0,\infty)

. General differentiable loss functions

\rho

are allowed provided that

\psi=\rho'

is 1-Lipschitz. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the

\ell_1

-penalized Huber M-estimator if the number of corrupted observations and sparsity of the true

\beta

are bounded from above by

s_*n

for some small enough constant

s_*\in(0,1)

independent of

n,p

. For the square loss and in the absence of corruption in the response, the results additionally yield

n^{-1/2}

-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty, estimates that were previously known for the Lasso.Comment: This version adds simulations for the nuclear norm penalt

arXiv.org e-Print Archive

Recommended from our members

New perspectives in cross-validation

Author: Zhou Wenda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Appealing due to its universality, cross-validation is an ubiquitous tool for model tuning and selection. At its core, cross-validation proposes to split the data (potentially several times), and alternatively use some of the data for fitting a model and the rest for testing the model. This produces a reliable estimate of the risk, although many questions remain concerning how best to compare such estimates across different models. Despite its widespread use, many theoretical problems remain unanswered for cross-validation, particularly in high-dimensional regimes where bias issues are non-negligible. We first provide an asymptotic analysis of the cross-validated risk in relation to the train-test split risk for a large class of estimators under stability conditions. This asymptotic analysis is expressed in the form of a central limit theorem, and allows us to characterize the speed-up of the cross-validation procedure for general parametric M-estimators. In particular, we show that when the loss used for fitting differs from that used for evaluation, k-fold cross-validation may offer a reduction in variance less (or greater) than k. We then turn our attention to the high-dimensional regime (where the number of parameters is comparable to the number of observations). In such a regime, k-fold cross-validation presents asymptotic bias, and hence increasing the number of folds is of interest. We study the extreme case of leave-one-out cross-validation, and show that, for generalized linear models under smoothness conditions, it is a consistent estimate of the risk at the optimal rate. Given the large computational requirements of leave-one-out cross-validation, we finally consider the problem of obtaining a fast approximate version of the leave-one-out cross-validation (ALO) estimator. We propose a general strategy for deriving formulas for such ALO estimators for penalized generalized linear models, and apply it to many common estimators such as the LASSO, SVM, nuclear norm minimization. The performance of such approximations are evaluated on simulated and real datasets

Columbia University Academic Commons