Search CORE

6,267 research outputs found

Using Data Science and Predictive Analytics to Understand 4-Year University Student Churn

Author: Whitlock Joshua Lee
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2018
Field of study

The purpose of this study was to discover factors about first-time freshmen that began at one of the six 4-year universities in the former Tennessee Board of Regents (TBR) system, transferred to any other institution after their first year, and graduated with a degree or certificate. These factors would be used with predictive models to identify these students prior to their initial departure. Thirty-four variables about students and the institutions that they attended and graduated from were used to perform principal component analysis to examine the factors involved in their decisions. A subset of 18 variables about these students in their first semester were used to perform principal component analysis and produce a set of 4 factors that were used in 5 predictive models. The 4 factors of students who transferred and graduated elsewhere were “Institutional Characteristics,” “Institution’s Focus on Academics,” “Student Aptitude,” and “Student Community.” These 4 factors were combined with the additional demographic variables of gender, race, residency, and initial institution to form a final dataset used in predictive modeling. The predictive models used were a logistic regression, decision tree, random forest, artificial neural network, and support vector machine. All models had predictive power beyond that of random chance. The logistic regression and support vector machine models had the most predictive power, followed by the artificial neural network, random forest, and decision tree models respectively

East Tennessee State University

Classifying Imbalanced Data: The Relevance of Accuracy and Feature Importance

Author: Widmann Torben
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa

Comparative Analysis of Classification Performance for U.S. College Enrollment Predictive Modeling Using Four Machine Learning Algorithms (Artificial Neural Network, Decision Tree, Support Vector Machine, Logistic Regression)

Author: Kye Anna
Publication venue: Loyola eCommons
Publication date: 01/01/2023
Field of study

Every year, the national high school graduation rate is declining and impacting the number of students applying to colleges. Moreover, the majority of students are applying to more than one college. This makes a lot of colleges to be highly competitive in student recruitment for enrollment and thus, the necessity for institutions to anticipate uncertainties related to budgets expected from student enrollment has increased. Hence enrollment management has become a pivotal sector in higher education institutions. Data and analytics are now a crucial part of enhancing enrollment management. Through big data analytics-driven solutions, institutions expect to improve enrollment by identifying students who are most likely to enroll in college. Machine learning can unlock significant value for colleges by allocating resources effectively to improve enrollment and budgeting. Therefore, a machine learning method is a vital tool for analyzing a large amount of data, and predictive analytics using this method has become a high demand in higher education. Yet higher education is still in the early stages of utilizing machine learning for enrollment management. In this study, I applied four machine learning algorithms to seven years of data on 108,798 students, each with 50 associated features, admitted to a 4-year, non-profit university in Midwest urban area to predict students\u27 college enrollment decisions. By treating the question of whether students offered admission will accept it as a binary classification problem, I implemented four machine learning algorithm classifiers and then evaluate the performance of these algorithms using the metrics of accuracy, sensitivity, specificity, precision, F-score, and area under the ROC and PR curves. The results from this study will indicate the best-performed prediction modeling of students’ college enrollment decisions. This research will expand the case and knowledge of utilizing machine learning methods in the higher education sector, focused on the U.S. College enrollment management field. Moreover, it will expand the knowledge of how the machine learning prediction model can be pragmatically used to support institutions in setting up student enrollment management strategies

Loyola eCommons

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Author: Aguiar Gabriel
Cano Alberto
Krawczyk Bartosz
Publication venue
Publication date: 07/04/2022
Field of study

Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose the first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams

arXiv.org e-Print Archive

A Hybrid AI Framework to Address the Issue of Frequent Missing Values with Application in EHR Systems: the Case of Parkinson’s Disease

Author: Amini Mostafa
Bagheri Ali
Delen Dursun
Piri Saeed
Publication venue
Publication date: 03/01/2024
Field of study

ScholarSpace at University of Hawai'i at Manoa