5,567 research outputs found
Boosting for high-dimensional linear models
We prove that boosting with the squared error loss, Boosting, is
consistent for very high-dimensional linear models, where the number of
predictor variables is allowed to grow essentially as fast as (exp(sample
size)), assuming that the true underlying regression function is sparse in
terms of the -norm of the regression coefficients. In the language of
signal processing, this means consistency for de-noising using a strongly
overcomplete dictionary if the underlying signal is sparse in terms of the
-norm. We also propose here an -based method for tuning,
namely for choosing the number of boosting iterations. This makes Boosting
computationally attractive since it is not required to run the algorithm
multiple times for cross-validation as commonly used so far. We demonstrate
Boosting for simulated data, in particular where the predictor dimension
is large in comparison to sample size, and for a difficult tumor-classification
problem with gene expression microarray data.Comment: Published at http://dx.doi.org/10.1214/009053606000000092 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Nielsen theory for coincidences of iterates
As the title suggests, this paper gives a Nielsen theory of coincidences of
iterates of two self maps f, g of a closed manifold. The ideas is, as much as
possible, to generalize Nielsen type periodic point theory, but there are many
obstacles. Many times we get similar results to the "classical ones" in Nielsen
periodic point theory, but with stronger hypotheses.Comment: 30 page
Early stopping and non-parametric regression: An optimal data-dependent stopping rule
The strategy of early stopping is a regularization technique based on
choosing a stopping time for an iterative algorithm. Focusing on non-parametric
regression in a reproducing kernel Hilbert space, we analyze the early stopping
strategy for a form of gradient-descent applied to the least-squares loss
function. We propose a data-dependent stopping rule that does not involve
hold-out or cross-validation data, and we prove upper bounds on the squared
error of the resulting function estimate, measured in either the and
norm. These upper bounds lead to minimax-optimal rates for various
kernel classes, including Sobolev smoothness classes and other forms of
reproducing kernel Hilbert spaces. We show through simulation that our stopping
rule compares favorably to two other stopping rules, one based on hold-out data
and the other based on Stein's unbiased risk estimate. We also establish a
tight connection between our early stopping strategy and the solution path of a
kernel ridge regression estimator.Comment: 29 pages, 4 figure
Concentration inequalities of the cross-validation estimate for stable predictors
In this article, we derive concentration inequalities for the
cross-validation estimate of the generalization error for stable predictors in
the context of risk assessment. The notion of stability has been first
introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and
\cite{KUNIY02} to characterize class of predictors with infinite VC dimension.
In particular, this covers -nearest neighbors rules, bayesian algorithm
(\cite{KEA95}), boosting,... General loss functions and class of predictors are
considered. We use the formalism introduced by \cite{DUD03} to cover a large
variety of cross-validation procedures including leave-one-out
cross-validation, -fold cross-validation, hold-out cross-validation (or
split sample), and the leave--out cross-validation.
In particular, we give a simple rule on how to choose the cross-validation,
depending on the stability of the class of predictors. In the special case of
uniform stability, an interesting consequence is that the number of elements in
the test set is not required to grow to infinity for the consistency of the
cross-validation procedure. In this special case, the particular interest of
leave-one-out cross-validation is emphasized
- …