268 research outputs found
A bias correction for the minimum error rate in cross-validation
Tuning parameters in supervised learning problems are often estimated by
cross-validation. The minimum value of the cross-validation error can be biased
downward as an estimate of the test error at that same value of the tuning
parameter. We propose a simple method for the estimation of this bias that uses
information from the cross-validation process. As a result, it requires
essentially no additional computation. We apply our bias estimate to a number
of popular classifiers in various settings, and examine its performance.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS224 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Exact Post-Selection Inference for Sequential Regression Procedures
We propose new inference tools for forward stepwise regression, least angle
regression, and the lasso. Assuming a Gaussian model for the observation vector
y, we first describe a general scheme to perform valid inference after any
selection event that can be characterized as y falling into a polyhedral set.
This framework allows us to derive conditional (post-selection) hypothesis
tests at any step of forward stepwise or least angle regression, or any step
along the lasso regularization path, because, as it turns out, selection events
for these procedures can be expressed as polyhedral constraints on y. The
p-values associated with these tests are exactly uniform under the null
distribution, in finite samples, yielding exact type I error control. The tests
can also be inverted to produce confidence intervals for appropriate underlying
regression parameters. The R package "selectiveInference", freely available on
the CRAN repository, implements the new inference tools described in this
paper.Comment: 26 pages, 5 figure
Strong rules for discarding predictors in lasso-type problems
We consider rules for discarding predictors in lasso regression and related
problems, for computational efficiency. El Ghaoui et al (2010) propose "SAFE"
rules that guarantee that a coefficient will be zero in the solution, based on
the inner products of each predictor with the outcome. In this paper we propose
strong rules that are not foolproof but rarely fail in practice. These can be
complemented with simple checks of the Karush- Kuhn-Tucker (KKT) conditions to
provide safe rules that offer substantial speed and space savings in a variety
of statistical convex optimization problems.Comment:
Rejoinder to “A Significance Test for the Lasso”
We would like to thank the editors and referees for their considerable efforts that improved our paper, and all of the discussants for their feedback, and their thoughtful and stimulating comments. Linear models are central in applied statistics, and inference for adaptive linear modeling is an important active area of research. Our paper is clearly not the last word on the subject! Several of the discussants introduce novel proposals for this problem; in fact, many of the discussions are interesting “mini-papers ” on their own, and we will not attempt to reply to all of the points that they raise. Our hope is that our paper and the excellent accompanying discussions will serve as a helpful resource for researchers interested in this topic. Since the writing of our original paper, we have (with many our of graduate students) extended the work considerably. Before responding to the discussants, we will first summarize this new work because it will be relevant to our responses. • As mentioned in the last section of the paper, we have derived a “spacing ” test of the global null hypothesis, β ∗ = 0, which takes the for
- …