526 research outputs found

    Context-dependent feature analysis with random forests

    Full text link
    In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.Comment: Accepted for presentation at UAI 201

    Sparsity Oriented Importance Learning for High-dimensional Linear Regression

    Full text link
    With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures

    Fitting Prediction Rule Ensembles with R Package pre

    Get PDF
    Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

    Classification-relevant Importance Measures for the West German Business Cycle

    Get PDF
    When analyzing business cycle data, one observes that the relevant predictor variables are often highly correlated. This paper presents a method to obtain measures of importance for the classification of data in which such multicollinearity is present. In systems with highly correlated variables it is interesting to know what changes are inflicted when a certain predictor is changed by one unit and all other predictors according to their correlation to the first instead of a ceteris paribus analysis. The approach described in this paper uses directional derivatives to obtain such importance measures. It is shown how the interesting directions can be estimated and different evaluation strategies for characteristics of classification models are presented. The method is then applied to linear discriminant analysis and multinomial logit for the classification of west German business cycle phases. --

    Identifying features predictive of faculty integrating computation into physics courses

    Full text link
    Computation is a central aspect of 21st century physics practice; it is used to model complicated systems, to simulate impossible experiments, and to analyze mountains of data. Physics departments and their faculty are increasingly recognizing the importance of teaching computation to their students. We recently completed a national survey of faculty in physics departments to understand the state of computational instruction and the factors that underlie that instruction. The data collected from the faculty responding to the survey included a variety of scales, binary questions, and numerical responses. We then used Random Forest, a supervised learning technique, to explore the factors that are most predictive of whether a faculty member decides to include computation in their physics courses. We find that experience using computation with students in their research, or lack thereof and various personal beliefs to be most predictive of a faculty member having experience teaching computation. Interestingly, we find demographic and departmental factors to be less useful factors in our model. The results of this study inform future efforts to promote greater integration of computation into the physics curriculum as well as comment on the current state of computational instruction across the United States
    corecore