1,559 research outputs found

    Predicting Academic Performance: A Systematic Literature Review

    Get PDF
    The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

    Supervised Learning Algorithms in Educational Data Mining: A Systematic Review

    Get PDF
    The academic institutions always looking for tools that improve their performance and enhance individuals outcomes. Due to the huge ability of data mining to explore hidden patterns and trends in the data, many researchers paid attention to Educational Data Mining (EDM) in the last decade. This field explores different types of data using different algorithms to extract knowledge that supports decision-making and academic sector development. The researchers in the field of EDM have proposed and adopted different algorithms in various directions. In this review, we have explored the published papers between 2010-2020 in the libraries (IEEE, ACM, Science Direct, and Springer) in the field of EDM are to answer review questions. We aimed to find the most used algorithm by researchers in the field of supervised machine learning in the period of 2010-2020. Additionally, we explored the most direction in the EDM and the interest of the researchers. During our research and analysis, many limitations have been examined and in addition to answering the review questions, some future works have been presented

    Student Dropout Analysis with Application of Data Mining Methods

    Get PDF
    One of the indicators of potential problems in the higher education system may be a large number of student dropouts in the junior years. An analysis of the existing transaction data provides the information on students that will allow the definition of the key processes that have to be adapted in order to enhance the efficiency of studying. To understand better the problem of dropouts, the data are processed by the application of data mining methods: logistic regression, decision trees and neural networks. The models are built according to the SEMMA methodology and then compared to select the one which best predicts the student dropout. This paper concentrates primarily to the application of the data mining method in area of higher education, in which such methods have not been applied yet. In addition, a model, useful for strategic planning of additional mechanisms to improve the efficiency of studying, is also suggested

    Association pattern of students thesis examination using fp-growth algorithms

    Get PDF
    The thesis examination is the final project for students to graduate from their majors. This thesis researches scientific work between a student and a supervisor in finding solutions to a problem. In the thesis examination, students must present their research results to be criticized by the examiner. This article aims to analyze the association pattern of student thesis examinations at a private university. Although the thesis's implementation has been carried out following procedures, to determine the composition of the board of examiners needs to be analyzed by examining the pattern of relationships between research topics, supervisors, and examiners. This study uses 448 data and uses FP-Growth Algorithms to find the rules. The research methodology starts from preparing the Dataset, cleansing data, selecting data, loading data into applications, transforming data, itemset frequencies, forming patterns, and analyzing rules. This study found 145 patterns of association rules with a minimum support value = 4 and a minimum trust value = 50%. The association rule pattern of 77.78% is under scientific group data. The benefits of the association pattern produced in this study can determine the composition of the examiners on the student thesis examination according to the research topic and scientific field of the examiners

    Predictive Modelling of Student Academic Performance – the Case of Higher Education in Middle East

    Get PDF
    One of the main issues in higher education is student retention. Predicting students' performance is an important task for higher education institutions in reducing students' dropout rate and increasing students' success. Educational Data mining is an emerging field that focuses on dealing with data related to educational settings. It includes reading the data, extracting the information and acquiring hidden knowledge. This research used data from one of the Gulf Cooperation Council (GCC) universities, as a case study of Higher Education in the Middle East. The concerned University has an enrolment of about 20,000 students of many different nationalities. The primary goal of this research is to investigate the ability of building predictive models to predict students' academic performance and identify the main factors that influence their performance and grade point average. The development of a generalized model (a model that could be applied on any institution that adopt the same grading system either on the Foundation level (that use binary response variable (Pass/ Fail) or count response variable which is the Grade Average Point for students enrol in the undergraduate academic programs) to identify students in jeopardy of dismissal will help to reduce the dropout rate by early identification of needed academic advising, and ultimately improve students' success. This research showed that data science algorithms could play a significant role in predicting students' Grade Point Average by adopting different regression algorithms. Different algorithms were carried out to investigate the ability of building predictive models to predict students' Grade Point Average after either 2, 4 or 6 terms. These methods are Linear/ Logistic Regression, Regression Trees and Random Forest. These predictive models are used to predict specific students' Grade Point Average based on other values in the dataset. In this type of model, explicit instruction is given about what the model needs to learn. An optimization function (the model) is formed to find the target output based on specific input values. This research opens the door for future comprehensive studies that apply a data science approach to higher-education systems and identifying the main factors that influence student performance

    Prediction of student success: A smart data-driven approach

    Get PDF
    Predicting student’s academic performance is one of the subjects related to the Educational Data Mining process, which intends to extract useful information and new patterns from educational data. Understanding the drivers of student success may assist educators in developing pedagogical methods providing a tool for personalized feedback and advice. In order to improve the academic performance of students and create a decision support solution for higher education institutes, this dissertation proposed a methodology that uses educational data mining to compare prediction models for the students' success. Data belongs to ISCTE master students, a Portuguese university, during 2012 to 2022 academic years. In addition, it was studied which factors are the strongest predictors of the student’s success. PyCaret library was used to compare the performance of several algorithms. Factors that were proposed to influence the success include, for example, the student's gender, previous educational background, the existence of a special statute, and the parents' educational degree. The analysis revealed that the Light Gradient Boosting Machine Classifier had the best performance with an accuracy of 87.37%, followed by Gradient Boosting Classifier (accuracy = 85.11%) and Adaptive Boosting Classifier (accuracy = 83.37%). Hyperparameter tunning improved the performance of all the algorithms. Feature importance analysis revealed that the factors that impacted the student’s success most were the average grade, master time, and the gap between degrees, i.e., the number of years between the last degree and the start of the master.A previsão do sucesso académico de estudantes é um dos tópicos relacionados com a mineração de dados educacionais, a qual pretende extrair informação útil e encontrar padrões a partir de dados académicos. Compreender que fatores afetam o sucesso dos estudantes pode ajudar, as instituições de educação, no desenvolvimento de métodos pedagógicos, dando uma ferramenta de feedback e aconselhamento personalizado. Com o fim de melhorar o desempenho académico dos estudantes e criar uma solução de apoio à decisão, para instituições de ensino superior, este artigo propõe uma metodologia que usa mineração de dados para comparar modelos de previsão para o sucesso dos alunos. Os dados pertencem a alunos de mestrado que frequentaram o ISCTE, uma universidade portuguesa, durante os anos letivos de 2012 a 2022. Além disso, foram estudados quais os fatores que mais afetam o sucesso do aluno. Os vários algoritmos foram comparados pela biblioteca PyCaret. Alguns dos fatores que foram propostos como relevantes para o sucesso incluem, o género do aluno, a formação educacional anterior, a existência de um estatuto especial e o grau de escolaridade dos pais. A análise dos resultados demonstrou que o classificador Light Gradient Boosting Machine (LGBMC) é o que tem o melhor desempenho com uma accuracy de 87.37%, seguindo-se o classificador Gradient Boosting Classifier (accuracy=85.11%) e o classificador Adaptive Boosting (accuracy=83.37%). A afinação de hiperparâmetros melhorou o desempenho de todos os algoritmos. As variáveis que demonstraram ter maior impacto foram a média dos estudantes, a duração do mestrado e o intervalo entre estudos

    La détection d'anomalies comme outil de renforcement d'analyse des données et de prédiction dans l'éducation

    Get PDF
    Les établissements d'enseignement cherchent à concevoir des mécanismes efficaces pour améliorer les résultats scolaires, renforcer le processus d'apprentissage et éviter l'abandon scolaire. L'analyse et la prédiction des performances des étudiants au cours de leurs études peuvent mettre en évidence certaines lacunes d'une formation et détecter les étudiants ayant des problèmes d'apprentissage. Il s'agit donc de développer des techniques et des modèles basés sur des données qui visent à améliorer l'enseignement et l'apprentissage. Les modèles classiques ignorent généralement les étudiants présentant des comportements et incohérences inhabituels, bien qu'ils puissent fournir des informations importantes aux experts du domaine et améliorer les modèles de prédiction. Les profils atypiques dans l'éducation sont à peine explorés et leur impact sur les modèles de prédiction n'a pas encore été étudié dans la littérature. Cette thèse vise donc à étudier les valeurs anormales dans les données éducatives et à étendre les connaissances existantes à leur sujet. La thèse présente trois études de cas de détection de données anormales pour différents contextes éducatifs et modes de représentation des données (jeu de données numériques pour une université allemande, jeu de données numériques pour une université russe, jeu de données séquentiel pour les écoles d'infirmières françaises). Pour chaque cas, l'approche de prétraitement des données est proposée en tenant compte des particularités du jeu de données. Les données préparées ont été utilisées pour détecter les valeurs anormales dans des conditions de vérité terrain inconnue. Les caractéristiques des valeurs anormales détectées ont été explorées et analysées, ce qui a permis d'étendre les connaissances sur le comportement des étudiants dans un processus d'apprentissage. L'une des principales tâches dans le domaine de l'éducation est de développer des mécanismes essentiels qui permettront d'améliorer les résultats scolaires et de réduire l'abandon scolaire. Ainsi, il est nécessaire de construire des modèles de prédiction de performance qui sont capables de détecter les étudiants ayant des problèmes d'apprentissage, qui ont besoin d'une aide spéciale. Le deuxième objectif de la thèse est d'étudier l'impact des valeurs anormales sur les modèles de prédiction. Nous avons considéré deux des tâches de prédiction les plus courantes dans le domaine de l'éducation: (i) la prédiction de l'abandon scolaire, (ii) la prédiction du score final. Les modèles de prédiction ont été comparés en fonction de différents algorithmes de prédiction et de la présence de valeurs anormales dans les données d'entraînement. Cette thèse ouvre de nouvelles voies pour étudier les performances des élèves dans les environnements éducatifs. La compréhension des valeurs anormales et des raisons de leur apparition peut aider les experts du domaine à extraire des informations précieuses des données. La détection des valeurs aberrantes pourrait faire partie du pipeline des systèmes d'alerte précoce pour détecter les élèves à haut risque d'abandon. De plus, les tendances comportementales des valeurs aberrantes peuvent servir de base pour fournir des recommandations aux étudiants dans leurs études ou prendre des décisions concernant l'amélioration du processus éducatif.Educational institutions seek to design effective mechanisms that improve academic results, enhance the learning process, and avoid dropout. The performance analysis and performance prediction of students in their studies may show drawbacks in the educational formations and detect students with learning problems. This induces the task of developing techniques and data-based models which aim to enhance teaching and learning. Classical models usually ignore the students-outliers with uncommon and inconsistent characteristics although they may show significant information to domain experts and affect the prediction models. The outliers in education are barely explored and their impact on the prediction models has not been studied yet in the literature. Thus, the thesis aims to investigate the outliers in educational data and extend the existing knowledge about them. The thesis presents three case studies of outlier detection for different educational contexts and ways of data representation (numerical dataset for the German University, numerical dataset for the Russian University, sequential dataset for French nurse schools). For each case, the data preprocessing approach is proposed regarding the dataset peculiarities. The prepared data has been used to detect outliers in conditions of unknown ground truth. The characteristics of detected outliers have been explored and analysed, which allowed extending the comprehension of students' behaviour in a learning process. One of the main tasks in the educational domain is to develop essential tools which will help to improve academic results and reduce attrition. Thus, plenty of studies aim to build models of performance prediction which can detect students with learning problems that need special help. The second goal of the thesis is to study the impact of outliers on prediction models. The two most common prediction tasks in the educational field have been considered: (i) dropout prediction, (ii) the final score prediction. The prediction models have been compared in terms of different prediction algorithms and the presence of outliers in the training data. This thesis opens new avenues to investigate the students' performance in educational environments. The understanding of outliers and the reasons for their appearance can help domain experts to extract valuable information from the data. Outlier detection might be a part of the pipeline in the early warning systems of detecting students with a high risk of dropouts. Furthermore, the behavioral tendencies of outliers can serve as a basis for providing recommendations for students in their studies or making decisions about improving the educational process

    A Data Science Maturity Model Applied to Students' Modeling

    Get PDF
    Maturity models define a series of levels, each representing an increased complexity in information systems. Data Science appears in the Business Intelligence (BI) and Business Analytics (BA) literature. This work applies the _IABE maturity model, which includes two additional levels: Data Engineering (DE) at the bottom and Business Experimentation (BE) at the top. This study uses the _IABE model for students' modeling in the ModEst project. For this purpose, the Public Administration organism is the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Education Ministry. DGEEC provided vast data on two million students per year in the Portuguese school system, from pre-scholar to doctoral programs. This work presents the comprehensible _IABE maturity model to extract new knowledge from the DGEEC dataset. The method applied is _IABE, where after the DE level, wh-questions are formulated and answered with the most appropriate techniques at each maturity level. This work's novelty is applying the maturity model _IABE to a unique dataset for the first time. Wh-questions are stated at the BI level using data summarization; at the BA level, predictive models are performed, and counterfactual approaches are presented at the BE level. Doi: 10.28991/ESJ-2023-07-06-08 Full Text: PD
    corecore