Search CORE

126,051 research outputs found

Using Random Forests to Describe Equity in Higher Education: A Critical Quantitative Analysis of Utah’s Postsecondary Pipelines

Author: McDaniel Tyler
Publication venue: Digital Commons @ Butler University
Publication date: 16/04/2018
Field of study

The following work examines the Random Forest (RF) algorithm as a tool for predicting student outcomes and interrogating the equity of postsecondary education pipelines. The RF model, created using longitudinal data of 41,303 students from Utah\u27s 2008 high school graduation cohort, is compared to logistic and linear models, which are commonly used to predict college access and success. Substantially, this work finds High School GPA to be the best predictor of postsecondary GPA, whereas commonly used ACT and AP test scores are not nearly as important. Each model identified several demographic disparities in higher education access, most significantly the effects of individual-level economic disadvantage. District- and school-level factors such as the proportion of Low Income students and the proportion of Underrepresented Racial Minority (URM) students were important and negatively associated with postsecondary success. Methodologically, the RF model was able to capture non-linearity in the predictive power of school- and district-level variables, a key finding which was undetectable using linear models. The RF algorithm outperforms logistic models in prediction of student enrollment, performs similarly to linear models in prediction of postsecondary GPA, and excels both models in its descriptions of non-linear variable relationships. RF provides novel interpretations of data, challenges conclusions from linear models, and has enormous potential to further the literature around equity in postsecondary pipelines

Digital Commons @ Butler University

Prediction with Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

Author: Kaplan Adam
Lock Eric F.
Publication venue: 'SAGE Publications'
Publication date: 01/07/2017
Field of study

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal components analysis (PCA). However, the application of PCA is not straightforward for multi-source data, wherein multiple sources of 'omics data measure different but related biological components. In this article we utilize recent advances in the dimension reduction of multi-source data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multi-source data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example we consider predicting survival for Glioblastoma Multiforme (GBM) patients from three data sources measuring mRNA expression, miRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction, and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function 'jive.predict'.Comment: 11 pages, 9 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

USMLE Scores Do Not Predict the Clinical Performance of Emergency Medicine Residents

Author: Ramoska Edward A
Sajadi-Ernazarova Karima
Saks Mark A
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Background: Scores on “high-stakes” multiple choice exams such as the United States Medical Licensing Examination® (USMLE) are important screening and applicant ranking criteria used by residencies.Objective: We tested the hypothesis that USMLE scores do not predict overall clinical performance of emergency medicine (EM) residents.Methods: All graduates from our University-based EM residency between the years 2008 and 2015 were included. Residents who had incomplete USMLE records were terminated, transferred out of the program, or did not graduate within this timeframe were excluded from the analysis. Clinical performance was defined as a gestalt of the residency program’s leadership and was classified into three sets: top, average, and lowest clinical performer. Dissimilarities of the initial blind rankings were adjudicated during a consensus conference.Results: During the eight years of the study period, there were a total of 115 graduating residents: 73 men (63%) and 42 women. Nearly all of them (109; 95%) had allopathic medical degrees; the remainder had osteopathic degrees. There was not a statistically significant correlation between our ranking of clinical performance and the Step 2 Clinical Knowledge score. There was a non-significant correlation between clinical performance and the Step 1 score.Conclusion: Neither USMLE Step 1 nor Step 2 Clinical Knowledge were good predictors of the actual clinical performance of residents during their training. We feel that their scores are overemphasized in the resident selection process

eScholarship - University of California