274 research outputs found
From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation
In statistical prediction, classical approaches for model selection and model
evaluation based on covariance penalties are still widely used. Most of the
literature on this topic is based on what we call the "Fixed-X" assumption,
where covariate values are assumed to be nonrandom. By contrast, it is often
more reasonable to take a "Random-X" view, where the covariate values are
independently drawn for both training and prediction. To study the
applicability of covariance penalties in this setting, we propose a
decomposition of Random-X prediction error in which the randomness in the
covariates contributes to both the bias and variance components. This
decomposition is general, but we concentrate on the fundamental case of least
squares regression. We prove that in this setting the move from Fixed-X to
Random-X prediction results in an increase in both bias and variance. When the
covariates are normally distributed and the linear model is unbiased, all terms
in this decomposition are explicitly computable, which yields an extension of
Mallows' Cp that we call . also holds asymptotically for certain
classes of nonnormal covariates. When the noise variance is unknown, plugging
in the usual unbiased estimate leads to an approach that we call ,
which is closely related to Sp (Tukey 1967), and GCV (Craven and Wahba 1978).
For excess bias, we propose an estimate based on the "shortcut-formula" for
ordinary cross-validation (OCV), resulting in an approach we call .
Theoretical arguments and numerical simulations suggest that is
typically superior to OCV, though the difference is small. We further examine
the Random-X error of other popular estimators. The surprising result we get
for ridge regression is that, in the heavily-regularized regime, Random-X
variance is smaller than Fixed-X variance, which can lead to smaller overall
Random-X error
Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning
We employ random matrix theory to establish consistency of generalized cross
validation (GCV) for estimating prediction risks of sketched ridge regression
ensembles, enabling efficient and consistent tuning of regularization and
sketching parameters. Our results hold for a broad class of asymptotically free
sketches under very mild data assumptions. For squared prediction risk, we
provide a decomposition into an unsketched equivalent implicit ridge bias and a
sketching-based variance, and prove that the risk can be globally optimized by
only tuning sketch size in infinite ensembles. For general subquadratic
prediction risk functionals, we extend GCV to construct consistent risk
estimators, and thereby obtain distributional convergence of the GCV-corrected
predictions in Wasserstein-2 metric. This in particular allows construction of
prediction intervals with asymptotically correct coverage conditional on the
training data. We also propose an "ensemble trick" whereby the risk for
unsketched ridge regression can be efficiently estimated via GCV using small
sketched ridge ensembles. We empirically validate our theoretical results using
both synthetic and real large-scale datasets with practical sketches including
CountSketch and subsampled randomized discrete cosine transforms.Comment: 42 pages, 6 figure
Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Interpolators -- estimators that achieve zero training error -- have
attracted growing attention in machine learning, mainly because state-of-the
art neural networks appear to be models of this type. In this paper, we study
minimum norm (``ridgeless'') interpolation in high-dimensional least
squares regression. We consider two different models for the feature
distribution: a linear model, where the feature vectors
are obtained by applying a linear transform to a vector of i.i.d.\ entries,
(with ); and a nonlinear model,
where the feature vectors are obtained by passing the input through a random
one-layer neural network, (with ,
a matrix of i.i.d.\ entries, and an
activation function acting componentwise on ). We recover -- in a
precise quantitative way -- several phenomena that have been observed in
large-scale neural networks and kernel machines, including the "double descent"
behavior of the prediction risk, and the potential benefits of
overparametrization.Comment: 68 pages; 16 figures. This revision contains non-asymptotic version
of earlier results, and results for general coefficient
Feature Extraction in Signal Regression: A Boosting Technique for Functional Data Regression
Main objectives of feature extraction in signal regression are the improvement of accuracy of prediction on future data and identification of relevant parts of the signal. A feature extraction procedure is proposed that uses boosting techniques to select the relevant parts of the signal. The proposed blockwise boosting procedure simultaneously selects intervals in the signal’s domain and estimates the effect on the response. The blocks that are defined explicitly use the underlying metric of the signal. It is demonstrated in simulation studies and for real-world data that the proposed approach competes well with procedures like PLS, P-spline signal regression and functional data regression.
The paper is a preprint of an article published in the Journal of Computational and Graphical Statistics. Please use the journal version for citation
FAStEN: an efficient adaptive method for feature selection and estimation in high-dimensional functional regressions
Functional regression analysis is an established tool for many contemporary
scientific applications. Regression problems involving large and complex data
sets are ubiquitous, and feature selection is crucial for avoiding overfitting
and achieving accurate predictions. We propose a new, flexible, and
ultra-efficient approach to perform feature selection in a sparse high
dimensional function-on-function regression problem, and we show how to extend
it to the scalar-on-function framework. Our method combines functional data,
optimization, and machine learning techniques to perform feature selection and
parameter estimation simultaneously. We exploit the properties of Functional
Principal Components, and the sparsity inherent to the Dual Augmented
Lagrangian problem to significantly reduce computational cost, and we introduce
an adaptive scheme to improve selection accuracy. Through an extensive
simulation study, we benchmark our approach to the best existing competitors
and demonstrate a massive gain in terms of CPU time and selection performance
without sacrificing the quality of the coefficients' estimation. Finally, we
present an application to brain fMRI data from the AOMIC PIOP1 study
Generalized Kernel Regularized Least Squares
Kernel Regularized Least Squares (KRLS) is a popular method for flexibly
estimating models that may have complex relationships between variables.
However, its usefulness to many researchers is limited for two reasons. First,
existing approaches are inflexible and do not allow KRLS to be combined with
theoretically-motivated extensions such as random effects, unregularized fixed
effects, or non-Gaussian outcomes. Second, estimation is extremely
computationally intensive for even modestly sized datasets. Our paper addresses
both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be
re-formulated as a hierarchical model thereby allowing easy inference and
modular model construction where KRLS can be used alongside random effects,
splines, and unregularized fixed effects. Computationally, we also implement
random sketching to dramatically accelerate estimation while incurring a
limited penalty in estimation quality. We demonstrate that gKRLS can be fit on
datasets with tens of thousands of observations in under one minute. Further,
state-of-the-art techniques that require fitting the model over a dozen times
(e.g. meta-learners) can be estimated quickly.Comment: Accepted version available at DOI below; corrected small typo
Effektiv modellutvelgelse i Tikhonov Regulariseringsrammeverket og preprosessering av spektroskopisk data
Machine learning is a hot topic in today's society. Data sets of varying sizes show up in a number of contexts, and learning from data sets is important for answering many questions. There is a plethora of methods that can be used to extract information from data, and in this thesis we consider primarily the Tikhonov Regularization (TR) framework for regularized linear least squares modeling. TR is a very flexible modeling framework, in the sense that it is easy to adjust the type of regularization used as well as including a priori information about the regression coefficients.
The main topic of this thesis is efficient model selection in the TR framework. When using TR regularization for modeling it is necessary to specify one or more model parameters, often called regularization parameters. The regularization parameter can have a significant effect on the quality of the final model, and choosing an appropriate regularization parameter is therefore an important part of the modeling. For large data sets model selection can be time consuming, and it is therefore of interest to obtain efficient methods for selecting between different models. In Paper I it is shown how generalized cross validation can be used for efficient model selection in the TR framework. This discussion continues in Paper III where it is shown how leave-one-out cross validation can be done efficiently in the TR framework. Paper III also suggests a heuristic that can be used for efficient model selection when dealing with data sets with repeated measurements of the same physical sample.
Raw data often needs to pre-processed before useful models can be created. Papers I and II deal with pre-processing and modeling of vibrational spectroscopic data in the extended multiplicative signal correction (EMSC) framework. In the EMSC framework unwanted effects in the data are modeled as multiplicative and additive effects. In Paper I it is shown how the correction of additive effects can be done while creating a regression model in the TR framework and why this can in some cases be advantageous. The multiplicative correction in EMSC is based on a single reference spectrum, but for data sets with very different spectra a single reference spectrum might not be sufficient to accurately correct for multiplicative effects in the measured spectra. Paper II discusses how to extend the EMSC framework to include multiple reference spectra as well as how appropriate reference spectra can be obtained automatically.
Paper IV considers classification using regularized linear discriminant analysis (RLDA). The link between RLDA and regularized regression is used to argue that the efficient validation criteria discussed in papers I and III also can be used for model validation in RLDA. This is tested empirically and the results indicate that good choices of the regularization parameter can be obtained efficiently using a regression-based criterion
- …