274 research outputs found

    From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation

    Full text link
    In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what we call the "Fixed-X" assumption, where covariate values are assumed to be nonrandom. By contrast, it is often more reasonable to take a "Random-X" view, where the covariate values are independently drawn for both training and prediction. To study the applicability of covariance penalties in this setting, we propose a decomposition of Random-X prediction error in which the randomness in the covariates contributes to both the bias and variance components. This decomposition is general, but we concentrate on the fundamental case of least squares regression. We prove that in this setting the move from Fixed-X to Random-X prediction results in an increase in both bias and variance. When the covariates are normally distributed and the linear model is unbiased, all terms in this decomposition are explicitly computable, which yields an extension of Mallows' Cp that we call RCpRCp. RCpRCp also holds asymptotically for certain classes of nonnormal covariates. When the noise variance is unknown, plugging in the usual unbiased estimate leads to an approach that we call RCp^\hat{RCp}, which is closely related to Sp (Tukey 1967), and GCV (Craven and Wahba 1978). For excess bias, we propose an estimate based on the "shortcut-formula" for ordinary cross-validation (OCV), resulting in an approach we call RCp+RCp^+. Theoretical arguments and numerical simulations suggest that RCP+RCP^+ is typically superior to OCV, though the difference is small. We further examine the Random-X error of other popular estimators. The surprising result we get for ridge regression is that, in the heavily-regularized regime, Random-X variance is smaller than Fixed-X variance, which can lead to smaller overall Random-X error

    Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning

    Full text link
    We employ random matrix theory to establish consistency of generalized cross validation (GCV) for estimating prediction risks of sketched ridge regression ensembles, enabling efficient and consistent tuning of regularization and sketching parameters. Our results hold for a broad class of asymptotically free sketches under very mild data assumptions. For squared prediction risk, we provide a decomposition into an unsketched equivalent implicit ridge bias and a sketching-based variance, and prove that the risk can be globally optimized by only tuning sketch size in infinite ensembles. For general subquadratic prediction risk functionals, we extend GCV to construct consistent risk estimators, and thereby obtain distributional convergence of the GCV-corrected predictions in Wasserstein-2 metric. This in particular allows construction of prediction intervals with asymptotically correct coverage conditional on the training data. We also propose an "ensemble trick" whereby the risk for unsketched ridge regression can be efficiently estimated via GCV using small sketched ridge ensembles. We empirically validate our theoretical results using both synthetic and real large-scale datasets with practical sketches including CountSketch and subsampled randomized discrete cosine transforms.Comment: 42 pages, 6 figure

    Surprises in High-Dimensional Ridgeless Least Squares Interpolation

    Full text link
    Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum 2\ell_2 norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors xiRpx_i \in {\mathbb R}^p are obtained by applying a linear transform to a vector of i.i.d.\ entries, xi=Σ1/2zix_i = \Sigma^{1/2} z_i (with ziRpz_i \in {\mathbb R}^p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi)x_i = \varphi(W z_i) (with ziRdz_i \in {\mathbb R}^d, WRp×dW \in {\mathbb R}^{p \times d} a matrix of i.i.d.\ entries, and φ\varphi an activation function acting componentwise on WziW z_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version of earlier results, and results for general coefficient

    Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression

    Get PDF
    Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression. The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation

    FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions

    Full text link
    Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible, and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components, and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance without sacrificing the quality of the coefficients' estimation. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study

    Generalized Kernel Regularized Least Squares

    Full text link
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.Comment: Accepted version available at DOI below; corrected small typo

    Effektiv modellutvelgelse i Tikhonov Regulariseringsrammeverket og preprosessering av spektroskopisk data

    Get PDF
    Machine learning is a hot topic in today's society. Data sets of varying sizes show up in a number of contexts, and learning from data sets is important for answering many questions. There is a plethora of methods that can be used to extract information from data, and in this thesis we consider primarily the Tikhonov Regularization (TR) framework for regularized linear least squares modeling. TR is a very flexible modeling framework, in the sense that it is easy to adjust the type of regularization used as well as including a priori information about the regression coefficients. The main topic of this thesis is efficient model selection in the TR framework. When using TR regularization for modeling it is necessary to specify one or more model parameters, often called regularization parameters. The regularization parameter can have a significant effect on the quality of the final model, and choosing an appropriate regularization parameter is therefore an important part of the modeling. For large data sets model selection can be time consuming, and it is therefore of interest to obtain efficient methods for selecting between different models. In Paper I it is shown how generalized cross validation can be used for efficient model selection in the TR framework. This discussion continues in Paper III where it is shown how leave-one-out cross validation can be done efficiently in the TR framework. Paper III also suggests a heuristic that can be used for efficient model selection when dealing with data sets with repeated measurements of the same physical sample. Raw data often needs to pre-processed before useful models can be created. Papers I and II deal with pre-processing and modeling of vibrational spectroscopic data in the extended multiplicative signal correction (EMSC) framework. In the EMSC framework unwanted effects in the data are modeled as multiplicative and additive effects. In Paper I it is shown how the correction of additive effects can be done while creating a regression model in the TR framework and why this can in some cases be advantageous. The multiplicative correction in EMSC is based on a single reference spectrum, but for data sets with very different spectra a single reference spectrum might not be sufficient to accurately correct for multiplicative effects in the measured spectra. Paper II discusses how to extend the EMSC framework to include multiple reference spectra as well as how appropriate reference spectra can be obtained automatically. Paper IV considers classification using regularized linear discriminant analysis (RLDA). The link between RLDA and regularized regression is used to argue that the efficient validation criteria discussed in papers I and III also can be used for model validation in RLDA. This is tested empirically and the results indicate that good choices of the regularization parameter can be obtained efficiently using a regression-based criterion
    corecore