526 research outputs found
Context-dependent feature analysis with random forests
In many cases, feature selection is often more complicated than identifying a
single subset of input variables that would together explain the output. There
may be interactions that depend on contextual information, i.e., variables that
reveal to be relevant only in some specific circumstances. In this setting, the
contribution of this paper is to extend the random forest variable importances
framework in order (i) to identify variables whose relevance is
context-dependent and (ii) to characterize as precisely as possible the effect
of contextual information on these variables. The usage and the relevance of
our framework for highlighting context-dependent variables is illustrated on
both artificial and real datasets.Comment: Accepted for presentation at UAI 201
Sparsity Oriented Importance Learning for High-dimensional Linear Regression
With now well-recognized non-negligible model selection uncertainty, data
analysts should no longer be satisfied with the output of a single final model
from a model selection process, regardless of its sophistication. To improve
reliability and reproducibility in model choice, one constructive approach is
to make good use of a sound variable importance measure. Although interesting
importance measures are available and increasingly used in data analysis,
little theoretical justification has been done. In this paper, we propose a new
variable importance measure, sparsity oriented importance learning (SOIL), for
high-dimensional regression from a sparse linear modeling perspective by taking
into account the variable selection uncertainty via the use of a sensible model
weighting. The SOIL method is theoretically shown to have the
inclusion/exclusion property: When the model weights are properly around the
true model, the SOIL importance can well separate the variables in the true
model from the rest. In particular, even if the signal is weak, SOIL rarely
gives variables not in the true model significantly higher important values
than those in the true model. Extensive simulations in several illustrative
settings and real data examples with guided simulations show desirable
properties of the SOIL importance in contrast to other importance measures
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
Classification-relevant Importance Measures for the West German Business Cycle
When analyzing business cycle data, one observes that the relevant predictor variables are often highly correlated. This paper presents a method to obtain measures of importance for the classification of data in which such multicollinearity is present. In systems with highly correlated variables it is interesting to know what changes are inflicted when a certain predictor is changed by one unit and all other predictors according to their correlation to the first instead of a ceteris paribus analysis. The approach described in this paper uses directional derivatives to obtain such importance measures. It is shown how the interesting directions can be estimated and different evaluation strategies for characteristics of classification models are presented. The method is then applied to linear discriminant analysis and multinomial logit for the classification of west German business cycle phases. --
Identifying features predictive of faculty integrating computation into physics courses
Computation is a central aspect of 21st century physics practice; it is used
to model complicated systems, to simulate impossible experiments, and to
analyze mountains of data. Physics departments and their faculty are
increasingly recognizing the importance of teaching computation to their
students. We recently completed a national survey of faculty in physics
departments to understand the state of computational instruction and the
factors that underlie that instruction. The data collected from the faculty
responding to the survey included a variety of scales, binary questions, and
numerical responses. We then used Random Forest, a supervised learning
technique, to explore the factors that are most predictive of whether a faculty
member decides to include computation in their physics courses. We find that
experience using computation with students in their research, or lack thereof
and various personal beliefs to be most predictive of a faculty member having
experience teaching computation. Interestingly, we find demographic and
departmental factors to be less useful factors in our model. The results of
this study inform future efforts to promote greater integration of computation
into the physics curriculum as well as comment on the current state of
computational instruction across the United States
- …