8 research outputs found
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
(with ); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, (with ,
a matrix of i.i.d.\ entries, and an
activation function acting componentwise on ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient
Out-of-sample error estimate for robust M-estimators with convex penalty
A generic out-of-sample error estimate is proposed for robust -estimators
regularized with a convex penalty in high-dimensional linear regression where
is observed and are of the same order. If is the
derivative of the robust data-fitting loss , the estimate depends on the
observed data only through the quantities ,
and the derivatives and
for fixed .
The out-of-sample error estimate enjoys a relative error of order
in a linear model with Gaussian covariates and independent noise, either
non-asymptotically when or asymptotically in the
high-dimensional asymptotic regime . General
differentiable loss functions are allowed provided that is
1-Lipschitz. The validity of the out-of-sample error estimate holds either
under a strong convexity assumption, or for the -penalized Huber
M-estimator if the number of corrupted observations and sparsity of the true
are bounded from above by for some small enough constant
independent of .
For the square loss and in the absence of corruption in the response, the
results additionally yield -consistent estimates of the noise
variance and of the generalization error. This generalizes, to arbitrary convex
penalty, estimates that were previously known for the Lasso.Comment: This version adds simulations for the nuclear norm penalt
Recommended from our members
New perspectives in cross-validation
Appealing due to its universality, cross-validation is an ubiquitous tool for model tuning and selection. At its core, cross-validation proposes to split the data (potentially several times), and alternatively use some of the data for fitting a model and the rest for testing the model. This produces a reliable estimate of the risk, although many questions remain concerning how best to compare such estimates across different models. Despite its widespread use, many theoretical problems remain unanswered for cross-validation, particularly in high-dimensional regimes where bias issues are non-negligible. We first provide an asymptotic analysis of the cross-validated risk in relation to the train-test split risk for a large class of estimators under stability conditions. This asymptotic analysis is expressed in the form of a central limit theorem, and allows us to characterize the speed-up of the cross-validation procedure for general parametric M-estimators. In particular, we show that when the loss used for fitting differs from that used for evaluation, k-fold cross-validation may offer a reduction in variance less (or greater) than k. We then turn our attention to the high-dimensional regime (where the number of parameters is comparable to the number of observations). In such a regime, k-fold cross-validation presents asymptotic bias, and hence increasing the number of folds is of interest. We study the extreme case of leave-one-out cross-validation, and show that, for generalized linear models under smoothness conditions, it is a consistent estimate of the risk at the optimal rate. Given the large computational requirements of leave-one-out cross-validation, we finally consider the problem of obtaining a fast approximate version of the leave-one-out cross-validation (ALO) estimator. We propose a general strategy for deriving formulas for such ALO estimators for penalized generalized linear models, and apply it to many common estimators such as the LASSO, SVM, nuclear norm minimization. The performance of such approximations are evaluated on simulated and real datasets