6,295 research outputs found

    Sparsity Oriented Importance Learning for High-dimensional Linear Regression

    Full text link
    With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures

    Context-dependent feature analysis with random forests

    Full text link
    In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.Comment: Accepted for presentation at UAI 201

    Fitting Prediction Rule Ensembles with R Package pre

    Get PDF
    Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

    Identifying features predictive of faculty integrating computation into physics courses

    Full text link
    Computation is a central aspect of 21st century physics practice; it is used to model complicated systems, to simulate impossible experiments, and to analyze mountains of data. Physics departments and their faculty are increasingly recognizing the importance of teaching computation to their students. We recently completed a national survey of faculty in physics departments to understand the state of computational instruction and the factors that underlie that instruction. The data collected from the faculty responding to the survey included a variety of scales, binary questions, and numerical responses. We then used Random Forest, a supervised learning technique, to explore the factors that are most predictive of whether a faculty member decides to include computation in their physics courses. We find that experience using computation with students in their research, or lack thereof and various personal beliefs to be most predictive of a faculty member having experience teaching computation. Interestingly, we find demographic and departmental factors to be less useful factors in our model. The results of this study inform future efforts to promote greater integration of computation into the physics curriculum as well as comment on the current state of computational instruction across the United States

    Bridiging designs for conjoint analysis: The issue of attribute importance.

    Get PDF
    Abstract: Conjoint analysis studies involving many attributes and attribute levels often occur in practice. Because such studies can cause respondent fatigue and lack of cooperation, it is important to design data collection tasks that reduce those problems. Bridging designs, incorporating two or more task subsets with overlapping attributes, can presumably lower task difficulty in such cases. In this paper, we present results of a study examining the effects on predictive validity of bridging design decisions involving important or unimportant attributes as links (bridges) between card-sort tasks and the degree of balance and consistency in estimated attribute importance across tasks. We also propose a new symmetric procedure, Symbridge, to scale the bridged conjoint solutions.Studies; Cooperation; Data; Problems; Effects; Decisions;
    • …
    corecore