Skip to main content
Article thumbnail
Location of Repository

Variable Selection Using The caret Package

By Max Kuhn

Abstract

Many models that can be accessed using caret’s train function produce prediction equations that do not necessarily use all the predictors. These models are thought to have built–in feature selection and include rpart, gbm, ada, glmboost, gamboost, blackboost, ctree, sparseLDA, sddaLDA, sddaQDA glmnet, lasso, lars, spls, earth, fda, bagEarth, bagFDA, pam and others. Many of the functions have an ancillary method called predictors that returns a vector indicating which predictors were used in the final model. In many cases, using these models with built–in feature selection will be more efficient than algorithms where the search routine for the right predictors is external to the model (see Section 2). Built–in feature selection typically couples the predictor search algorithm with the parameter estimation and are usually optimized with a single objective function (e.g. error rates or likelihood). 2 Feature Selection Using Search Algorithms 2.1 Searching the Feature Space Many feature selection routines used a “wrapper ” approach to find appropriate variables such that an algorithm that searches the feature space repeatedly fits the model with different predictor sets. The best predictor set is determined by some measure of performance (i.e. R 2, classificatio

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.192.6323
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://cran.at.r-project.org/w... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.