Search CORE

6,295 research outputs found

Sparsity Oriented Importance Learning for High-dimensional Linear Regression

Author: Yang Yi
Yang Yuhong
Ye Chenglong
Publication venue
Publication date: 01/08/2016
Field of study

With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures

arXiv.org e-Print Archive

FigShare

Context-dependent feature analysis with random forests

Author: Geurts Pierre
Huynh-Thu Vân Anh
Louppe Gilles
Sutera Antonio
Wehenkel Louis
Publication venue
Publication date: 12/05/2016
Field of study

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.Comment: Accepted for presentation at UAI 201

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Fitting Prediction Rule Ensembles with R Package pre

Author: Fokkema Marjolein
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/02/2020
Field of study

Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

arXiv.org e-Print Archive

Journal of Statistical Software

Leiden University Scholary Publications

Identifying features predictive of faculty integrating computation into physics courses

Author: Aiken John M.
Allen Grant
Caballero Marcos D.
Henderson Rachel
Young Nicholas T.
Publication venue
Publication date: 01/01/2019
Field of study

Computation is a central aspect of 21st century physics practice; it is used to model complicated systems, to simulate impossible experiments, and to analyze mountains of data. Physics departments and their faculty are increasingly recognizing the importance of teaching computation to their students. We recently completed a national survey of faculty in physics departments to understand the state of computational instruction and the factors that underlie that instruction. The data collected from the faculty responding to the survey included a variety of scales, binary questions, and numerical responses. We then used Random Forest, a supervised learning technique, to explore the factors that are most predictive of whether a faculty member decides to include computation in their physics courses. We find that experience using computation with students in their research, or lack thereof and various personal beliefs to be most predictive of a faculty member having experience teaching computation. Interestingly, we find demographic and departmental factors to be less useful factors in our model. The results of this study inform future efforts to promote greater integration of computation into the physics curriculum as well as comment on the current state of computational instruction across the United States

arXiv.org e-Print Archive

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Bridiging designs for conjoint analysis: The issue of attribute importance.

Author: François Pierre
MacLachlan Douglas L.
Publication venue
Publication date
Field of study

Abstract: Conjoint analysis studies involving many attributes and attribute levels often occur in practice. Because such studies can cause respondent fatigue and lack of cooperation, it is important to design data collection tasks that reduce those problems. Bridging designs, incorporating two or more task subsets with overlapping attributes, can presumably lower task difficulty in such cases. In this paper, we present results of a study examining the effects on predictive validity of bridging design decisions involving important or unimportant attributes as links (bridges) between card-sort tasks and the degree of balance and consistency in estimated attribute importance across tasks. We also propose a new symmetric procedure, Symbridge, to scale the bridged conjoint solutions.Studies; Cooperation; Data; Problems; Effects; Decisions;

Research Papers in Economics

Recommended from our members

Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose.

Author: Bouzid Yasmine Y
Burnett Dustin J
Chin Elizabeth L
Kan Annie
Lemay Danielle G
Simmons Gabriel
Tagkopoulos Ilias
Publication venue: eScholarship, University of California
Publication date: 01/12/2019
Field of study

The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients ("Nutrient-Only") or the nutrient and food descriptions ("Nutrient + Text"). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24

eScholarship - University of California