Search CORE

3,816 research outputs found

Application of classification models to predict students’ academic performance using classifiers ensemble and synthetic minority over sampling techniques

Author: Abdulazeez Y
Abdulwahab L
Publication venue: 'African Journals Online (AJOL)'
Publication date: 23/04/2019
Field of study

The demand for data-driven decision making has resulted in the application of data mining in the educational sector and other disciplines. The needs for improving the performance of data mining models have been identified as an interesting area of research globally. Higher educational institutions keep a large amount of students’ data, but these data are rarely used effectively in decision and or policy-making processes. This research is an attempt to enhance the performance of data mining models to predict students’ academic performance using stacking classifiers ensemble and synthetic minority over-sampling techniques. The three (3) classifiers models J48, IBK and SMO were trained and tested on 206 students’ data set using previous academic performance records of Federal University Dutse, Nigeria. WEKA 3.9.1 data mining tool was used in predicting the final year student’s classes of degree at an undergraduate level, while Unified Tertiary Matriculation Examination, Senior Secondary Certificate Examinations and first-year Cumulative Grade Point Average of students served as inputs to the model. The result obtained showed that on training dataset after class balancing, stacking classifiers ensemble model out- performing the other three (3) classifiers models in both performance accuracy (96.7949%) and RSME (0.1098), suggesting that stacking classifiers ensemble is the best model in context of this research.Keywords: Educational Data Mining, J48. SMO. IBK, Stacking Classifiers Ensembl

AJOL - African Journals Online

A Hybrid Machine Learning Framework for Predicting Students’ Performance in Virtual Learning Environment

Author: Evangelista Edmund
Publication venue: ZU Scholars
Publication date: 21/12/2021
Field of study

Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students\u27 performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students\u27 performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students\u27 performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study

ZU Scholars (Zayed University)

Family Environment Variables as Predictors of School Absenteeism Severity at Multiple Levels: Ensemble and Classification and Regression Tree Analysis

Author: Fornander Mirae J.
Kearney Christopher A.
Publication venue: Digital Scholarship@UNLV
Publication date: 18/10/2019
Field of study

School attendance problems, including school absenteeism, are common to many students worldwide, and frameworks to better understand these heterogeneous students include multiple classes or tiers of intertwined risk factors as well as interventions. Recent studies have thus examined risk factors at varying levels of absenteeism severity to demarcate distinctions among these tiers. Prior studies in this regard have focused more on demographic and academic variables and less on family environment risk factors that are endemic to this population. The present study utilized ensemble and classification and regression tree analysis to identify potential family environment risk factors among youth (i.e., children and adolescents) at different levels of school absenteeism severity (i.e., 1 + %, 3 + %, 5 + %, 10 + %). Higher levels of absenteeism were also examined on an exploratory basis. Participants included 341 youth aged 5–17 years (M = 12.2; SD = 3.3) and their families from an outpatient therapy clinic (68.3%) and community (31.7%) setting, the latter from a family court and truancy diversion program cohort. Family environment risk factors tended to be more circumscribed and informative at higher levels of absenteeism, with greater diversity at lower levels. Higher levels of absenteeism appear more closely related to lower achievement orientation, active-recreational orientation, cohesion, and expressiveness, though several nuanced results were found as well. Absenteeism severity levels of 10–15% may be associated more with qualitative changes in family functioning. These data may support a Tier 2-Tier 3 distinction in this regard and may indicate the need for specific family-based intervention goals at higher levels of absenteeism severity

University of Nevada, Las Vegas Repository

Internalizing Symptoms as Predictors of School Absenteeism Severity at Multiple Levels: Ensemble and Classification and Regression Tree Analysis

Author: Fornander Mirae J.
Kearney Christopher A.
Publication venue: Digital Scholarship@UNLV
Publication date: 21/01/2020
Field of study

School attendance problems are highly prevalent worldwide, leading researchers to investigate many different risk factors for this population. Of considerable controversy is how internalizing behavior problems might help to distinguish different types of youth with school attendance problems. In addition, efforts are ongoing to identify the point at which children and adolescents move from appropriate school attendance to problematic school absenteeism. The present study utilized ensemble and classification and regression tree analysis to identify potential internalizing behavior risk factors among youth at different levels of school absenteeism severity (i.e., 1+%, 3+%, 5+%, 10+%). Higher levels of absenteeism were also examined on an exploratory basis. Participants included 160 youth aged 6–19 years (M = 13.7; SD = 2.9) and their families from an outpatient therapy clinic (39.4%) and community (60.6%) setting, the latter from a family court and truancy diversion program cohort. One particular item relating to lack of enjoyment was most predictive of absenteeism severity at different levels, though not among the highest levels. Other internalizing items were also predictive of various levels of absenteeism severity, but only in a negatively endorsed fashion. Internalizing symptoms of worry and fatigue tended to be endorsed higher across less severe and more severe absenteeism severity levels. A general expectation that predictors would tend to be more homogeneous at higher than lower levels of absenteeism severity was not generally supported. The results help confirm the difficulty of conceptualizing this population based on forms of behavior but may support the need for early warning sign screening for youth at risk for school attendance problems

University of Nevada, Las Vegas Repository

Student Performance Prediction Using A Cascaded Bi-level Feature Selection Approach

Author: Abdullahi Wokili
Kenneth Mary Ogbuka
Olalere Morufu
Publication venue: 'Bilingual Publishing Co.'
Publication date: 25/08/2021
Field of study

Features in educational data are ambiguous which leads to noisy features and curse of dimensionality problems. These problems are solved via feature selection. There are existing models for features selection. These models were created using either a single-level embedded, wrapperbased or filter-based methods. However single-level filter-based methods ignore feature dependencies and ignore the interaction with the classifier. The embedded and wrapper based feature selection methods interact with the classifier, but they can only select the optimal subset for a particular classifier. So their selected features may be worse for other classifiers. Hence this research proposes a robust Cascade Bi-Level (CBL) feature selection technique for student performance prediction that will minimize the limitations of using a single-level technique. The proposed CBL feature selection technique consists of the Relief technique at first-level and the Particle Swarm Optimization (PSO) at the second-level. The proposed technique was evaluated using the UCI student performance dataset. In comparison with the performance of the single-level feature selection technique the proposed technique achieved an accuracy of 94.94% which was better than the values achieved by the single-level PSO with an accuracy of 93.67% for the binary classification task. These results show that CBL can effectively predict student performance

Bilingual Publishing Co. (BPC): E-Journals

Evaluation of algorithms to predict graduation rate in higher education institutions by applying educational data mining

Author: Luján-Mora Sergio
Moscoso-Zea Oswaldo
Saa Pablo
Publication venue: 'Informa UK Limited'
Publication date: 11/04/2019
Field of study

Nowadays, researchers analyse student data to predict the graduation rate by looking at the characteristics of students enrolled and to take corrective actions at an early stage or improve the admission process. Educational data mining (EDM) is an emerging field that can support the implementation of changes in the management of higher education institutions. EDM analyses educational data using the development and the application of data mining (DM) methods and algorithms to information stored in academic data repositories. The purpose of this paper is to review which methods and algorithms of DM can be used in the analysis of educational data to improve decision-making. Furthermore, it evaluates these algorithms using a dataset composed of student data in the computer science school of a private university. The core of the analysis is to discover trends and patterns of study in the graduation rate indicator. Finally, it compares these methods and algorithms and suggests which has the best precision in certain scenarios. Our analyses suggest that random trees had better precision but had limitations due to the difficulty of interpretation while the J48 algorithm had better possibilities of interpretation of results in the visualisation of the classification of data and only had slightly inferior performance

Repositorio Institucional de la Universidad de Alicante

A Multi Hidden Recurrent Neural Network with a Modified Grey Wolf Optimizer

Author: Abbas Dosti K.
Rashid Tarik A.
Turel Yalin K.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Identifying university students' weaknesses results in better learning and can function as an early warning system to enable students to improve. However, the satisfaction level of existing systems is not promising. New and dynamic hybrid systems are needed to imitate this mechanism. A hybrid system (a modified Recurrent Neural Network with an adapted Grey Wolf Optimizer) is used to forecast students' outcomes. This proposed system would improve instruction by the faculty and enhance the students' learning experiences. The results show that a modified recurrent neural network with an adapted Grey Wolf Optimizer has the best accuracy when compared with other models.Comment: 34 pages, published in PLoS ON

arXiv.org e-Print Archive

Directory of Open Access Journals

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref

A Predictive Model using Machine Learning Algorithm in Identifying Student's Probability on Passing Semestral Course

Author: Doctor Anabella C.
Publication venue: 'STEP Academic Publisher'
Publication date: 01/01/2023
Field of study

Purpose: The used of an integrated academic information system in higher education has been proven in improving quality education which results to generates enormous data that can be used to discover new knowledge through data mining concepts, techniques, and machine learning algorithm. This study aims to determine a predictive model to learn students' probability to pass their courses taken at the earliest stage of the semester. Method: To successfully discover a good predictive model with high acceptability, accurate, and precision rate which delivers a useful outcome for decision making in education systems, in improving the processes of conveying knowledge and uplifting student's academic performance, the proponent applies and strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. This study employs classification for data mining techniques, and decision tree for algorithm. Results: With the utilization of the newly discovered predictive model, the prediction of students' probabilities to pass the current courses they take gives 0.7619 accuracy, 0.8333 precision, 0.8823 recall, and 0.8571 f1 score, which shows that the model used in the prediction is reliable, accurate, and recommendable. Conclusion: Considering the indicators and the results, it can be noted that the prediction model used in this study is highly acceptable. The data mining techniques provides effective and efficient innovative tools in analyzing and predicting student performances. The model used in this study will greatly affect the way educators understand and identify the weakness of their students in the class, the way they improved the effectiveness of their learning processes gearing to their students, bring down academic failure rates, and help institution administrators modify their learning system outcomes. Recommendations: Full automation of prediction results accessible by the students, faculty, and institution administrators for fast management decision making should take place. Further study for the inclusion of some student`s demographic information, vast amount of data within the dataset, automated and manual process of predictive criteria indicators where the students can regulate to which criteria, they must improve more for them to pass their courses taken at the end of the semester as early as midterm period are highly needed

SSOAR - Social Science Open Access Repository