8 research outputs found

    Surprises in High-Dimensional Ridgeless Least Squares Interpolation

    Full text link
    Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum β„“2\ell_2 norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xi∈Rpx_i \in {\mathbb R}^p are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Ξ£1/2zix_i = \Sigma^{1/2} z_i (with zi∈Rpz_i \in {\mathbb R}^p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=Ο†(Wzi)x_i = \varphi(W z_i) (with zi∈Rdz_i \in {\mathbb R}^d, W∈RpΓ—dW \in {\mathbb R}^{p \times d} a matrix of i.i.d.\ entries, and Ο†\varphi an activation function acting componentwise on WziW z_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

    Out-of-sample error estimate for robust M-estimators with convex penalty

    Full text link
    A generic out-of-sample error estimate is proposed for robust MM-estimators regularized with a convex penalty in high-dimensional linear regression where (X,y)(X,y) is observed and p,np,n are of the same order. If ψ\psi is the derivative of the robust data-fitting loss ρ\rho, the estimate depends on the observed data only through the quantities ψ^=ψ(yβˆ’XΞ²^)\hat\psi = \psi(y-X\hat\beta), X⊀ψ^X^\top \hat\psi and the derivatives (βˆ‚/βˆ‚y)ψ^(\partial/\partial y) \hat\psi and (βˆ‚/βˆ‚y)XΞ²^(\partial/\partial y) X\hat\beta for fixed XX. The out-of-sample error estimate enjoys a relative error of order nβˆ’1/2n^{-1/2} in a linear model with Gaussian covariates and independent noise, either non-asymptotically when p/n≀γp/n\le \gamma or asymptotically in the high-dimensional asymptotic regime p/nβ†’Ξ³β€²βˆˆ(0,∞)p/n\to\gamma'\in(0,\infty). General differentiable loss functions ρ\rho are allowed provided that ψ=ρ′\psi=\rho' is 1-Lipschitz. The validity of the out-of-sample error estimate holds either under a strong convexity assumption, or for the β„“1\ell_1-penalized Huber M-estimator if the number of corrupted observations and sparsity of the true Ξ²\beta are bounded from above by sβˆ—ns_*n for some small enough constant sβˆ—βˆˆ(0,1)s_*\in(0,1) independent of n,pn,p. For the square loss and in the absence of corruption in the response, the results additionally yield nβˆ’1/2n^{-1/2}-consistent estimates of the noise variance and of the generalization error. This generalizes, to arbitrary convex penalty, estimates that were previously known for the Lasso.Comment: This version adds simulations for the nuclear norm penalt
    corecore