Search CORE

6,723 research outputs found

Predictive models as early warning systems for student academic performance in introductory programming

Author: Veerasamy Ashok Kumar
Publication venue: fi=Turku Centre for Computer Science|en=Turku Centre for Computer Science|
Publication date: 08/12/2020
Field of study

Computer programming is fundamental to Computer Science and IT curricula. At the novice level it covers programming concepts that are essential for subsequent advanced programming courses. However, introductory programming courses are among the most challenging courses for novices and high failure and attrition rates continue even as computer science education has seen improvements in pedagogy. Consequently, the quest to identify factors that affect student learning and academic performance in introductory computer programming courses has been a long-standing activity. Specifically, weak novice learners of programming need to be identified and assisted early in the semester in order to alleviate any potential risk of failing or withdrawing from their course. Hence, it is essential to identify at-risk programming students early, in order to plan (early) interventions. The goal of this thesis was to develop a validated, predictive model(s) with suitable predictors of student academic performance in introductory programming courses. The proposed model utilises the Naïve Bayes classification machine learning algorithm to analyse student performance data, based on the principle of parsimony. Furthermore, an additional objective was to propose this validated predictive model as an early warning system (EWS), to predict at-risk students early in the semester and, in turn, to potentially inform instructors (and students) for early interventions. We obtained data from two introductory programming courses in our study to develop and test the predictive models. The models were built with student presage and in progress-data for which instructors may easily collect or access despite the nature of pedagogy of educational settings. In addition, our work analysed the predictability of selected data sources and looked for the combination of predictors, which yields the highest prediction accuracy to predict student academic performance. The prediction accuracies of the models were computed by using confusion matrix data including overall model prediction accuracy, prediction accuracy sensitivity and specificity, balanced accuracy and the area under the ROC curve (AUC) score for generalisation. On average, the models developed with formative assessment tasks, which were partially assisted by the instructor in the classroom, returned higher at-risk prediction accuracies than the models developed with take-home assessment task only as predictors. The unknown data test results of this study showed that it is possible to predict 83% of students that need support as early as Week 3 in a 12-week introductory programming course. The ensemble method-based results suggest that it is possible to improve overall at-risk prediction performance with low false positives and to incorporate this in early warning systems to identify students that need support, in order to provide early intervention before they reach critical stages (at-risk of failing). The proposed model(s) of this study were developed on the basis of the principle of parsimony as well as previous research findings, which accounted for variations in academic settings, such as academic environment, and student demography. The predictive model could potentially provide early warning indicators to facilitate early warning intervention strategies for at-risk students in programming that allow for early interventions. The main contribution of this thesis is a model that may be applied to other programming and non-programming courses, which have both continuous formative and a final exam summative assessment, to predict final student performance early in the semester.Ohjelmointi on informaatioteknologian ja tietojenkäsittelytieteen opinto-ohjelmien olennainen osa. Aloittelijatasolla opetus kattaa jatkokurssien kannalta keskeisiä ohjelmoinnin käsitteitä. Tästä huolimatta ohjelmoinnin peruskurssit ovat eräitä haasteellisimmista kursseista aloittelijoille. Korkea keskeyttämisprosentti ja opiskelijoiden asteittainen pois jättäytyminen ovat vieläkin tunnusomaisia piirteitä näille kursseille, vaikka ohjelmoinnin opetuksen pedagogiikka onkin kehittynyt. Näin ollen vaikuttavia syitä opiskelijoiden heikkoon suoriutumiseen on etsitty jo pitkään. Erityisesti heikot, aloittelevat ohjelmoijat tulisi tunnistaa mahdollisimman pian, jotta heille voitaisiin tarjota tukea ja pienentää opiskelijan riskiä epäonnistua kurssin läpäimisessä ja riskiä jättää kurssi kesken. Heikkojen opiskelijoiden tunnistaminen on tärkeää, jotta voidaan suunnitella aikainen väliintulo. Tämän väitöskirjatyön tarkoituksena oli kehittää todennettu, ennustava malli tai malleja sopivilla ennnustusfunktioilla koskien opiskelijan akateemista suoriutumista ohjelmoinnin peruskursseilla. Kehitetty malli käyttää koneoppivaa naiivia bayesilaista luokittelualgoritmia analysoimaan opiskelijoiden suoriutumisesta kertynyttä aineistoa. Lähestymistapa perustuu yksinkertaisimpien mahdollisten selittävien mallien periaatteeseen. Lisäksi, tavoitteena oli ehdottaa tätä validoitua ennustavaa mallia varhaiseksi varoitusjärjestelmäksi, jolla ennustetaan putoamisvaarassa olevat opiskelijat opintojakson alkuvaiheessa sekä informoidaan ohjaajia (ja opiskelijaa) aikaisen väliintulon tarpeellisuudesta. Keräsimme aineistoa kahdelta ohjelmoinnin peruskurssilta, jonka pohjalta ennustavaa mallia kehitettiin ja testattiin. Mallit on rakennettu opiskelijoiden ennakkotietojen ja kurssin kestäessä kerättyjen suoriutumistietojen perusteella, joita ohjaajat voivat helposti kerätä tai joihin he voivat päästä käsiksi oppilaitoksesta tai muusta ympäristöstä huolimatta. Lisäksi väitöskirjatyö analysoi valittujen datalähteiden ennustettavuutta ja sitä, mitkä mallien muuttujista ja niiden kombinaatioista tuottivat kannaltamme korkeimman ennustetarkkuuden opiskelijoiden akateemisessa suoriutumisessa. Mallien ennustusten tarkkuuksia laskettiin käyttämällä sekaannusmatriisia, josta saadaan laskettua ennusteen tarkkuus, ennusteen spesifisyys, sensitiivisyys, tasapainotettu tarkkuus sekä luokitteluvastekäyriä (receiver operating characteristics (ROC)) ja näiden luokitteluvastepinta-ala (area under curve (AUC)) Mallit, jotka kehitettiin formatiivisilla tehtävillä, ja joissa ohjaaja saattoi osittain auttaa luokkahuonetilanteessa, antoivat keskimäärin tarkemman ennustuksen putoamisvaarassa olevista opiskelijoista kuin mallit, joissa käytettiin kotiin vietäviä tehtäviä ainoina ennusteina. Tuntemattomalla testiaineistolla tehdyt mallinnukset osoittavat, että voimme tunnistaa jo 3. viikon kohdalla 83% niistä opiskelijoista, jotka tarvitsevat lisätukea 12 viikkoa kestävällä ohjelmoinnin kurssilla. Tulosten perusteella vaikuttaisi, että yhdistämällä metodeja voidaan saavuttaa parempi yleinen ennustettavuus putoamisvaarassa olevien opiskelijoiden suhteen pienemmällä määrällä väärin luokiteltuja epätositapauksia. Tulokset viittaavat myös siihen, että on mahdollista sisällyttää yhdistelmämalli varoitusjärjestelmiin, jotta voidaan tunnistaa avuntarpeessa olevia opiskelijoita ja tarjota täten varhaisessa vaiheessa tukea ennen kuin on liian myöhäistä. Tässä tutkimuksessa esitellyt mallit on kehitetty nojautuen yksinkertaisimman selittävän mallin periaatteeseen ja myös aiempiin tutkimustuloksiin, joissa huomioidaan erilaiset akateemiset ympäristöt ja opiskelijoiden tausta. Ennustava malli voi tarjota indikaattoreita, jotka voivat mahdollisesti toimia pohjana väliintulostrategioihin kurssilta putoamisvaarassa olevien opiskelijoiden tukemiseksi. Tämän tutkimuksen keskeisin anti on malli, jolla opiskelijoiden suoriutumista voidaan arvioida muilla ohjelmointia ja muita aihepiirejä käsittelevillä kursseilla, jotka sisältävät sekä jatkuvaa arviointia että loppukokeen. Malli ennustaisi näillä kursseilla lopullisen opiskelijan suoritustason opetusjakson alkuvaiheessa

UTUPub

Supervised Learning Algorithms in Educational Data Mining: A Systematic Review

Author: Awadh Wid Aqeel
Dahr Jasim Mohammed
Hashim Ali Salah
Humadi Aqeel Majeed
Kamel Mohammed B. M.
Khalaf Alaa
Najim Ihab Ahmed
Publication venue: International University of Sarajevo
Publication date: 01/05/2021
Field of study

The academic institutions always looking for tools that improve their performance and enhance individuals outcomes. Due to the huge ability of data mining to explore hidden patterns and trends in the data, many researchers paid attention to Educational Data Mining (EDM) in the last decade. This field explores different types of data using different algorithms to extract knowledge that supports decision-making and academic sector development. The researchers in the field of EDM have proposed and adopted different algorithms in various directions. In this review, we have explored the published papers between 2010-2020 in the libraries (IEEE, ACM, Science Direct, and Springer) in the field of EDM are to answer review questions. We aimed to find the most used algorithm by researchers in the field of supervised machine learning in the period of 2010-2020. Additionally, we explored the most direction in the EDM and the interest of the researchers. During our research and analysis, many limitations have been examined and in addition to answering the review questions, some future works have been presented

Inquiry (E-Journal - Faculty of Business and Administration, International University of Sarajevo)

Prediction of Student Final Exam Performance in an Introductory Programming Course: Development and Validation of the Use of a Support Vector Machine-Regression Model

Author: Daryl D’Souza
Mikko-Jussi Laakso
Rolf Lindén
Veerasamy Ashok Kumar
Publication venue: 'NIDA/RTI International'
Publication date: 28/10/2022
Field of study

This paper presents a Support Vector Machine predictive model to determine if prior programming knowledge and completion of in-class and take home formative assessment tasks might be suitable predictors of examination performance. Student data from the academic years 2012 - 2016 for an introductory programming course was captured via ViLLE e-learning tool for analysis. The results revealed that student prior programming knowledge and assessment scores captured in a predictive model, is a good fit of the data. However, while overall success of the model is significant, predictions on identifying at-risk students is neither high nor low and that persuaded us to include two more research questions. However, our preliminary post analysis on these test results show that on average students who secured less than 70% in formative assessment scores with little or basic prior programming knowledge in programming may fail in the final programming exam and increase the prediction accuracy in identifying at-risk students from 46% to nearly 63%. Hence, these results provide immediate information for programming course instructors and students to enhance teaching and learning process.</p

UTUPub

Introductory programming: a systematic literature review

Author: Abu Naser Samy S.
Agarwal Achla
Ahmed
Ahren T. C.
Al-Jarrah Ahmad
Alammary Ali
Annamalai Subashini
Ayub Mewati
Badri Suzan
Bai Yu
Baird Bridget
Bandura Albert
Barlow-Jones Glenda
Bayliss Jessica D.
Ben-Ari Mordechai
Bennedsen Jens
Bennett Chris
Berglund Anders
Berland Matthew
Briggs Tom
Bumbacher Engin
Burch Carl
Carbonaro Antonella
Carbone Angela
Cardell-Oliver Rachel
Chad Lane H
Char Bruce
Charalampos Spyropoulos
Charles Therese
Chinn Donald
Chinn Donald
Corney Malcolm
Coull Natalie J
Crawford Stewart
Cruz Gilbert
de Raadt Michael
de Raadt Michael
de Raadt Michael
de Raadt Michael
de Raadt Michael
Devey Adrian
Dickson Paul E.
Dillon Edward
Doherty Liam
Durrheim Mark S.
D’Souza Daryl
Eagly Alice H
Edgcomb Alex
Edwards Stephen H.
Falkner Katrina
Firmalo Fabic Geela Venise
Fonseca Fred
Fürst Luka
Garner Stuart
Goadrich Mark
Gonzalez Gracielo
Gudmundsen Dee
Haghighi Pari Delir
Hare Brian K
Heliotis James
Hooshyar Danial
Hovemeyer David
Hu Minjie
Hu Minjie
Hu Yun-Jen
Huang Chenn-Jung
Jacqueline
Jayal Ambikesh
Jurado Francisco
Kanaparan Geetha
Kasto Nadia
Kiran L.
Kirby Stephen
Kitchenham Barbara
Kouznetsova Svetlana
Kölling Michael
LeJeune Noel
Leska Chuck
Lipman Derrell
Lister Raymond
Lister Raymond
Lister Raymond
Lopez Mike
Lulis Evelyn
Luoma Harri
Major L.
McKeown Jim
McWhorter William Isaac
Medley M. Dee
Mentis Alexander
Menyhárt László
Mullins Paul
Munson Jonathan P.
Muntha Surya
Murphy Laurie
Neto Vicente Lustosa
Nguyen Thuy-Linh
Okada Ken
Orehovački Tihomir
Orehovački Tihomir
Palmer James Dean
Park Myung Ah
Parsons Dale
Paul Jody
Peachock Patrick
Pearce Janice L.
Pero Štefan
Price Kellie
Quintin
Rajala Teemu
Ramli R.Z.
Ray Andrew
Rodrigo Maria Mercedes T
Roels Reinout
Rountree Janet
Russo Mark F.
Sanou Loé
Schoeffel Pablo
Schramm Joachim
Shabalina Olga
Sharp Jason H
Sheard Judy
Sheard Judy
Shuhidan Shuhaida
Sindre Guttorm
Skudder Ben
Song Hosung
Sorva Juha
Sung Kelvin
Takemura Yasuhiro
Teague D.
Teague Donna
Teague Donna
Teague Donna
Teague Donna
Thompson Errol
Torrey Lisa
Truong Nghi
Vincenti Giovanni
Wang Hong
Watkins Kera Z. B.
Weragama Dinesha
Whalley Jacqueline
Whalley Jacqueline
Whalley Jacqueline
Whittall S. J.
Whittinghill David
Wiebe E
Wood Krissi
Yoo Jungsoon P
Yusri Nurliana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2018
Field of study

As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming, there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research

Michigan Technological University

Crossref

Falmouth University Research Repository (FURR)

ResearchOnline@GCU

Predictive Analysis of Students’ Learning Performance Using Data Mining Techniques: A Comparative Study of Feature Selection Methods

Author: Syed Mustapha S. M. F. D.
Publication venue: ZU Scholars
Publication date: 29/09/2023
Field of study

The utilization of data mining techniques for the prompt prediction of academic success has gained significant importance in the current era. There is an increasing interest in utilizing these methodologies to forecast the academic performance of students, thereby facilitating educators to intervene and furnish suitable assistance when required. The purpose of this study was to determine the optimal methods for feature engineering and selection in the context of regression and classification tasks. This study compared the Boruta algorithm and Lasso regression for regression, and Recursive Feature Elimination (RFE) and Random Forest Importance (RFI) for classification. According to the findings, Gradient Boost for the regression part of this study had the least Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE) of 12.93 and 18.28, respectively, in the case of the Boruta selection method. In contrast, RFI was found to be the superior classification method, yielding an accuracy rate of 78% in the classification part. This research emphasized the significance of employing appropriate feature engineering and selection methodologies to enhance the efficacy of machine learning algorithms. Using a diverse set of machine learning techniques, this study analyzed the OULA dataset, focusing on both feature engineering and selection. Our approach was to systematically compare the performance of different models, leading to insights about the most effective strategies for predicting student success

ZU Scholars (Zayed University)

The role of machine learning in identifying students at-risk and minimizing failure

Author: Alhajj Reda
Elhage Tarek
Pek Reyhan Zeynep
Tarıyan Özyer Sibel
Özyer Tansel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Education is very important for students' future success. The performance of students can be supported by the extra assignments and projects given by the instructors for students with low performance. However, a major problem is that students at-risk cannot be identified early. This situation is being investigated by various researchers using Machine Learning techniques. Machine learning is used in a variety of areas and has also begun to be used to identify students at-risk early and to provide support by instructors. This research paper discusses the performance results found using Machine learning algorithms to identify at-risk students and minimize student failure. The main purpose of this project is to create a hybrid model using the ensemble stacking method and to predict at-risk students using this model. We used machine learning algorithms such as Naive Bayes, Random Forest, Decision Tree, K-Nearest Neighbors, Support Vector Machine, AdaBoost Classifier and Logistic Regression in this project. The performance of each machine learning algorithm presented in the project was measured with various metrics. Thus, the hybrid model by combining algorithms that give the best prediction results is presented in this study. The data set containing the demographic and academic information of the students was used to train and test the model. In addition, a web application developed for the effective use of the hybrid model and for obtaining prediction results is presented in the report. In the proposed method, it has been realized that stratified k-fold cross validation and hyperparameter optimization techniques increased the performance of the models. The hybrid ensemble model was tested with a combination of two different datasets to understand the importance of the data features. In first combination, the accuracy of the hybrid model was obtained as 94.8% by using both demographic and academic data. In the second combination, when only academic data was used, the accuracy of the hybrid model increased to 98.4%. This study focuses on predicting the performance of at-risk students early. Thus, teachers will be able to provide extra assistance to students with low performance

İstanbul Medipol University Institutional Repository

Predicting Academic Performance: A Systematic Literature Review

Author: Abdulwahhab R. S.
Aggarwal H.W.
Agudo-Peregrina Ángel F
al Rifaie Mohammad Majid
Allan Fransiskus
Almuniri Ismail
Ashenafi Michael Mogessie
Aziz Fatihah
Baker Ryan SJD
Barba-Guamán L.
Bayer Jaroslav
Blei David M.
Bydžovská Hana
Cengiz Nihat
Chan L.
Chaturvedi R.
Chen YY
Choi D.S.
Chunqiao Mi
Collura Michael A
Corsatea B. M.
Cribbs Jennifer D
Deliz José R
DeMonbrun R.M.
Dávila Saylisse
Edmundo
Evale Digna S
Fincher Sally
Gil-Herrera Eleazar
Gray Geraldine
Güner Necdet
Haig Thomas
Han M.
Ho Chia-Lin
Hornik Kurt
Howell Larry L
Hu Qian
Huang Shaobo
Huang Yun
Imbrie PK
Jiang Suhang
Jove E.
Kai Shimin
Kaur P.
Kentli Fulya Damla
Kuehn M.
Kumar A. Dinesh
Kumar Mukesh
Luo Jingyi
Luo Ling
Manoharan J James
Mashiloane Lebogang
Mayilvaganan M
Mhetre V.
Moradi F.
Morsy S.
Nedungadi Prema
Ninrutsirikun U.
Paimin Aini Nazura
Paimin Aini Nazura
Palmer Stuart
Pandey Mrinal
Papamitsiou Zacharoula
Pardos Zachary A
Patrick A Borrego
Pushpa S.K.
Raman D.R.
Ramanathan L.
Ramirez Nichole
Raura G.
Raymond Ting Siu-Man
Reid Kenneth
Reisberg Rachelle
Ren Zhiyun
Rhodes Nicholas
Ringenberg Jeff
Sadati S.
Sadler William E
Schar Mark
Sievert Carson
Sivasakthi M.
Sorby Sheryl
Strawderman Lesley
Sugiharti E.
Tieu Hoang
Tomkins Sabina
Tsalatsanis Athanasios
Uswatun Annisa
Verma S.K.
Vihavainen Arto
Vogt Christina
Wang Feng
Wolff Thomas F
Wu Xinhui
Wyk Barend Van
Yang T.-Y.
Yeh Her-Tyan
Zhu Ke
Publication venue: ACM
Publication date: 01/01/2018
Field of study

The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Monash University Research Portal

Educational anomaly analytics : features, methods, and challenges

Author: Bai Xiaomei
Firmin Selena
Guo Teng
Tian Xue
Xia Feng
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Anomalies in education affect the personal careers of students and universities' retention rates. Understanding the laws behind educational anomalies promotes the development of individual students and improves the overall quality of education. However, the inaccessibility of educational data hinders the development of the field. Previous research in this field used questionnaires, which are time- and cost-consuming and hardly applicable to large-scale student cohorts. With the popularity of educational management systems and the rise of online education during the prevalence of COVID-19, a large amount of educational data is available online and offline, providing an unprecedented opportunity to explore educational anomalies from a data-driven perspective. As an emerging field, educational anomaly analytics rapidly attracts scholars from a variety of fields, including education, psychology, sociology, and computer science. This paper intends to provide a comprehensive review of data-driven analytics of educational anomalies from a methodological standpoint. We focus on the following five types of research that received the most attention: course failure prediction, dropout prediction, mental health problems detection, prediction of difficulty in graduation, and prediction of difficulty in employment. Then, we discuss the challenges of current related research. This study aims to provide references for educational policymaking while promoting the development of educational anomaly analytics as a growing field. Copyright © 2022 Guo, Bai, Tian, Firmin and Xia

Federation ResearchOnline

PubMed Central

A Predictive Analysis of Academic Success at Universidade do Porto

Author: Rafael Antonio Belokurows
Publication venue
Publication date: 09/11/2021
Field of study

Repositório Aberto da Universidade do Porto

On predicting academic performance with process mining in learning analytics

Author: Anuradha Mathrani
Rahila Umer
Suriadi Suriadi
Teo Susnjak
Publication venue: 'Emerald'
Publication date: 01/05/2018
Field of study

Purpose - The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques. Design/methodology/approach - Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used. Findings - The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way. Practical implications - Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate. Social implications - Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course. Originality/value - This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses

Directory of Open Access Journals