8,340 research outputs found
A bias correction for the minimum error rate in cross-validation
Tuning parameters in supervised learning problems are often estimated by
cross-validation. The minimum value of the cross-validation error can be biased
downward as an estimate of the test error at that same value of the tuning
parameter. We propose a simple method for the estimation of this bias that uses
information from the cross-validation process. As a result, it requires
essentially no additional computation. We apply our bias estimate to a number
of popular classifiers in various settings, and examine its performance.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS224 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Exact Post-Selection Inference for Sequential Regression Procedures
We propose new inference tools for forward stepwise regression, least angle
regression, and the lasso. Assuming a Gaussian model for the observation vector
y, we first describe a general scheme to perform valid inference after any
selection event that can be characterized as y falling into a polyhedral set.
This framework allows us to derive conditional (post-selection) hypothesis
tests at any step of forward stepwise or least angle regression, or any step
along the lasso regularization path, because, as it turns out, selection events
for these procedures can be expressed as polyhedral constraints on y. The
p-values associated with these tests are exactly uniform under the null
distribution, in finite samples, yielding exact type I error control. The tests
can also be inverted to produce confidence intervals for appropriate underlying
regression parameters. The R package "selectiveInference", freely available on
the CRAN repository, implements the new inference tools described in this
paper.Comment: 26 pages, 5 figure
The Lasso Problem and Uniqueness
The lasso is a popular tool for sparse linear regression, especially for
problems in which the number of variables p exceeds the number of observations
n. But when p>n, the lasso criterion is not strictly convex, and hence it may
not have a unique minimum. An important question is: when is the lasso solution
well-defined (unique)? We review results from the literature, which show that
if the predictor variables are drawn from a continuous probability
distribution, then there is a unique lasso solution with probability one,
regardless of the sizes of n and p. We also show that this result extends
easily to penalized minimization problems over a wide range of loss
functions.
A second important question is: how can we deal with the case of
non-uniqueness in lasso solutions? In light of the aforementioned result, this
case really only arises when some of the predictor variables are discrete, or
when some post-processing has been performed on continuous predictor
measurements. Though we certainly cannot claim to provide a complete answer to
such a broad question, we do present progress towards understanding some
aspects of non-uniqueness. First, we extend the LARS algorithm for computing
the lasso solution path to cover the non-unique case, so that this path
algorithm works for any predictor matrix. Next, we derive a simple method for
computing the component-wise uncertainty in lasso solutions of any given
problem instance, based on linear programming. Finally, we review results from
the literature on some of the unifying properties of lasso solutions, and also
point out particular forms of solutions that have distinctive properties.Comment: 25 pages, 0 figure
A General Framework for Fast Stagewise Algorithms
Forward stagewise regression follows a very simple strategy for constructing
a sequence of sparse regression estimates: it starts with all coefficients
equal to zero, and iteratively updates the coefficient (by a small amount
) of the variable that achieves the maximal absolute inner product
with the current residual. This procedure has an interesting connection to the
lasso: under some conditions, it is known that the sequence of forward
stagewise estimates exactly coincides with the lasso path, as the step size
goes to zero. Furthermore, essentially the same equivalence holds
outside of least squares regression, with the minimization of a differentiable
convex loss function subject to an norm constraint (the stagewise
algorithm now updates the coefficient corresponding to the maximal absolute
component of the gradient).
Even when they do not match their -constrained analogues, stagewise
estimates provide a useful approximation, and are computationally appealing.
Their success in sparse modeling motivates the question: can a simple,
effective strategy like forward stagewise be applied more broadly in other
regularization settings, beyond the norm and sparsity? The current
paper is an attempt to do just this. We present a general framework for
stagewise estimation, which yields fast algorithms for problems such as
group-structured learning, matrix completion, image denoising, and more.Comment: 56 pages, 15 figure
- β¦