230,616 research outputs found
Exact Post-Selection Inference for Sequential Regression Procedures
We propose new inference tools for forward stepwise regression, least angle
regression, and the lasso. Assuming a Gaussian model for the observation vector
y, we first describe a general scheme to perform valid inference after any
selection event that can be characterized as y falling into a polyhedral set.
This framework allows us to derive conditional (post-selection) hypothesis
tests at any step of forward stepwise or least angle regression, or any step
along the lasso regularization path, because, as it turns out, selection events
for these procedures can be expressed as polyhedral constraints on y. The
p-values associated with these tests are exactly uniform under the null
distribution, in finite samples, yielding exact type I error control. The tests
can also be inverted to produce confidence intervals for appropriate underlying
regression parameters. The R package "selectiveInference", freely available on
the CRAN repository, implements the new inference tools described in this
paper.Comment: 26 pages, 5 figure
"Pre-conditioning" for feature selection and regression in high-dimensional problems
We consider regression problems where the number of predictors greatly
exceeds the number of observations. We propose a method for variable selection
that first estimates the regression function, yielding a "pre-conditioned"
response variable. The primary method used for this initial regression is
supervised principal components. Then we apply a standard procedure such as
forward stepwise selection or the LASSO to the pre-conditioned response
variable. In a number of simulated and real data examples, this two-step
procedure outperforms forward stepwise selection or the usual LASSO (applied
directly to the raw outcome). We also show that under a certain Gaussian latent
variable model, application of the LASSO to the pre-conditioned response
variable is consistent as the number of predictors and observations increases.
Moreover, when the observational noise is rather large, the suggested procedure
can give a more accurate estimate than LASSO. We illustrate our method on some
real problems, including survival analysis with microarray data
DISCRIMINANT STEPWISE PROCEDURE
Stepwise procedure is now probably the most popular tool for automatic feature selection. In the most cases it represents model selection approach which evaluates various feature subsets (so called wrapper). In fact it is heuristic search technique which examines the space of all possible feature subsets. This method is known in the literature under different names and variants. We organize the concepts and terminology, and show several variants of stepwise feature selection from a search strategy point of view. Short review of implementations in R will be given
- …
