730 research outputs found

    Machine learning prediction and analysis of students’ academic performance

    Get PDF
    The aims of this research were to develop a machine learning prediction Decision Tree classification model and analyze the success of engineering students based on their performances during secondary school education. The success of students was analyzed and measured as a binomial response to whether students successfully finished the first and the second study years. The developed model examined general success, number of awards obtained at competitions, special awards, average grades in mathematics, physics, and one of the official state languages during secondary school as predictor variables. General success was defined by summing up students’ grade point averages (GPA) of each school year. The number of courses transferred from the first into the second study year and students’ GPA obtained during the first study year were added as predictor variables in the analysis and development of a prediction model for the student’s success during the second study year and their enrollment in the third study year. Data showed that majority of the students enrolled in the first study year were gymnasium or technical high school graduates. Developed machine learning prediction model showed that for the success of enrolled students in the first study year General Success of students during secondary school is the most important predictor variable, followed by mathematics and physics grades. However, for the success of the students enrolled in the second study year the most important predictor variable was number of the courses transferred from the first into the second study year, followed by students’ GPA obtained during the first study year and General Success. Machine learning Decision Tree classification modeling was shown to be an adequate tool for the prediction of the success of engineering students during the first and second study years

    Using predictive modelling to create the school dropouts' profile: a case study regarding elementary and high school students

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe students’ disengagement with school is a worldwide contemporary topic, which has been to lengthy discussions. This event may be an indicator of the possibility of a precocious school dropout, becoming a burden for the students’ families, schools and the Government. This study focused only on a school located in Amadora, where the school dropout rate is quite significant. The main purpose of the present thesis is to understand students' proneness to quit school in a premature fashion. A dataset containing all the pupils’ information available in that institution considered was transformed, trained and tested in order to produce a detailed analysis. The main conclusions taken from the study are that the students’ characteristics and familiar context play the major role in their likeliness to dropout school.O desinteresse escolar dos alunos com a escola, tópico de vastas discussões, é um tema atual em todo o mundo. Este fenónomo pode ser um indicador da possibilidade de abandono escolar precoce da escola, traduzindo-se num fardo para as famílias dos alunos, para as escolas e para o próprio Governo. Este estudo focou-se somente numa escola localizada na Amadora, onde a taxa de abandono escolar é bastante significativa. O principal objetivo da presente tese é entender a propensão dos alunos para abandonar a abandonar a escola de maneira prematura. Um conjunto de dados que contém todas as informações dos alunos disponíveis na instituição considerada foi transformado, treinado e testado para produzir uma análise detalhada que procura responder à premissa base da investigação. As principais conclusões tiradas do estudo são que as características e o contexto familiar dos alunos têm um papel determinante na sua probabilidade de abandonar a escola

    Una revisión sobre la predicción del rendimiento académico mediante métodos de ensamble

    Get PDF
    Introduction: This article is a product of the research “Ensemble methods to estimate the academic perfor-mance of higher education students”, developed at the Universidad Distrital Francisco José de Caldas in the year 2021, focusing on the review of research work developed in the last five years related to the prediction of academic performance using ensemble algorithms. Objective: The literature review aims to identify the most used algorithms and the most relevant variables in the prediction of academic performance.Methodology: A systematic review of the literature was carried out in different academic databases (Science Direct, Scopus, SAGE Journals, EBSCO, ResearchGate, Google Scholar), using search equations built with keywords.Results: 54 related articles were found that meet the inclusion criteria of the review. Additionally, benefits were found in the application of ensemble methods in the prediction of academic performance.Conclusion: It was found that the most influential variables in academic performance correspond to the aca-demic factor. The algorithm used that presents the best results is Random Forest; in addition to being the most used. The use of these algorithms is an accurate tool to predict academic performance at any stage of university life, and at the same time provide information to generate strategies to improve dropout and academic retention indicators.Introducción: El presente artículo es producto de la investigación “Métodos de ensamble para estimar el ren-dimiento académico de estudiantes de educación superior”, desarrollado en la Universidad Distrital Francisco José de Caldas en el año 2021 y se centra en la revisión de trabajos de investigación desarrollados en los últimos cinco años relacionados a la predicción del rendimiento académico utilizando algoritmos de ensamble.Objetivo: La revisión de la literatura tiene como objetivo identificar los algoritmos más utilizados y las variables más relevantes en la predicción del rendimiento académico.Metodología: Se realizó una revisión sistemática de la literatura en distintas bases de datos académicas (Science Direct, Scopus, SAGE Journals, EBSCO, ResearchGate, Google Scholar), utilizando ecuaciones de bús-queda construidas con palabras claves.Resultados: Se encontraron 54 artículos relacionados que cumplen con los criterios de inclusión de la revisión. Además, se encontraron beneficios en la aplicación de métodos de ensamble en la predicción del rendimiento académico. Conclusión: Se encontró que las variables más influyentes en el rendimiento académico corresponden al factor académico, el algoritmo utilizado que presenta mejores resultados es Random Forest, además de que fue el más utilizado, y que el uso de estos algoritmos es una herramienta precisa para predecir el rendimiento acadé-mico en cualquier etapa de la vida universitaria, y a su vez brindar la información para generar estrategias que permitan mejorar los indicadores de deserción y retención académica

    Defining Problematic School Absenteeism Using Nonparametric Modeling

    Full text link
    Contemporary classification models of school absenteeism often employ a multitier approach for organizing assessment and treatment strategies. Researchers have yet to agree, however, on how to objectively define problematic school absenteeism and identify demarcation points for each tier. The present study aimed to inform a multitier approach by determining the most relevant risk factors for problematic school absenteeism. The most useful targets of assessment for problematic school absenteeism are also addressed. The present study examined problematic school absenteeism defined at three distinct cutoffs: 1%, 10%, and 15% of full school days missed. The present study evaluated interactions among several youth- and academic-related variables at each cutoff. Participants included 316,004 elementary, middle, and high school youth from the Clark County School District of Nevada. The present study examined all youth regardless of their school absenteeism. The present study employed Binary Recursive Partitioning (BRP) techniques to identify the most relevant risk factors and highlight profiles of youth exhibiting school absenteeism at each cutoff by constructing classification trees. BRP, a nonparametric statistical approach, is most appropriate for generating, not testing, hypotheses. Anticipated findings were thus offered cautiously. The first hypothesis was that participation in school sports would produce the greatest impurity reduction in the classification tree-model for problematic school absenteeism, defined as equal to or greater than 1% of full school days missed. The second hypothesis was that grade level, letter grades for specific high school core academic courses (i.e., Algebra I, Algebra II, Biology, Chemistry, English 9, English 10, English 11, English 12, and Geometry), and GPA would produce the greatest impurity reductions in the classification tree-model for problematic school absenteeism, defined as equal to or greater than 10% of full school days missed. The third hypothesis was that age, gender, and ethnicity would produce the greatest impurity reductions in the classification tree-model for problematic school absenteeism, defined as equal to or greater than 15% of full school days missed. Models were constructed via Classification and Regression Tree (CART) analysis utilizing SPSS decision tree software. The first hypothesis was not supported but the second and third hypotheses received partial support. Results revealed age, ethnicity, gender, GPA, grade level, and IEP eligibility as relevant risk factors for problematic school absenteeism among the three cutoffs. Implications for clinicians and educators are discussed

    Student dropout risk detection at University of Évora

    Get PDF
    Currently, student dropout is a global problem in higher education affecting the results of education systems. In addition to providing state-of-the-art education, any institution needs to maintain its student flow rate, which means that predicting dropout is critical to measuring the success of an education system. This work focuses on identifying the risk of dropout at the University of Évora based on students’ academic performance. We propose a set of aca- demic information as predictive attributes and present machine learning models that have a precision of 96.8% and f1-measure of 94.8% as perfor- mance in identifying students at risk of dropping out. In this regard, 13 years of academic data were collected from four different academic programs (the academic years 2006/2007 to 2018/2019 and Man- agement, Biology, Informatics Engineering and Nursing programs). After collecting the students’ academic records, anonymizing the information and pre-processing the data, an engineering and attribute selection process was conducted, building the data sets. Various machine learning algorithms were applied and their performance was compared; models were built with Deci- sion Trees (DT), Naïve Bayes (NB), Support Vector Machines (SVM) and Random Forest (RF), with the latter algorithm having obtained the best performance in terms of recall; Sumário: Detecção de Risco de Abandono de Alunos na Universidade de Évora Atualmente, o abandono escolar é um problema global no ensino superior que afeta os resultados dos sistemas educativos. Além de fornecer educação de ponta, qualquer instituição precisa manter a taxa de fluxo de alunos, o que significa que a previsão do abandono escolar é essencial para medir o sucesso de um sistema de ensino. Este trabalho centra-se na identificação do risco de abandono escolar na Uni- versidade de Évora com base no desempenho escolar dos alunos. Propomos um conjunto de informação académica como atributos preditivos e apresen- tamos modelos de aprendizagem automática que apresentam uma precisão de 96.8% e f1-medir de 94.8% como desempenho na identificação de alunos em risco de desistência. Nesse sentido, foram recolhidos 13 anos de dados académicos de quatro cursos diferentes (anos letivos de 2006/2007 a 2018/2019 e cursos de Gestão, Bi- ologia, Engenharia Informática e Enfermagem). Após a recolha do percurso académico dos alunos, a anonimização da informação e o pré-processamento dos dados, foi conduzido um processo de engenharia e seleção de atributos, construindo assim os conjuntos de dados. Foram aplicados vários algoritmos de aprendizagem automática e o seu desempenho foi comparado; foram con struídos modelo com Árvores de Decisão (DT), Naïve Bayes (NB), Máquinas de Vetores de Suporte (SVM) e Random Forest (RF), tendo este último al- goritmo obtido o melhor desempenho no que respeita à cobertura

    Prediction of student success: A smart data-driven approach

    Get PDF
    Predicting student’s academic performance is one of the subjects related to the Educational Data Mining process, which intends to extract useful information and new patterns from educational data. Understanding the drivers of student success may assist educators in developing pedagogical methods providing a tool for personalized feedback and advice. In order to improve the academic performance of students and create a decision support solution for higher education institutes, this dissertation proposed a methodology that uses educational data mining to compare prediction models for the students' success. Data belongs to ISCTE master students, a Portuguese university, during 2012 to 2022 academic years. In addition, it was studied which factors are the strongest predictors of the student’s success. PyCaret library was used to compare the performance of several algorithms. Factors that were proposed to influence the success include, for example, the student's gender, previous educational background, the existence of a special statute, and the parents' educational degree. The analysis revealed that the Light Gradient Boosting Machine Classifier had the best performance with an accuracy of 87.37%, followed by Gradient Boosting Classifier (accuracy = 85.11%) and Adaptive Boosting Classifier (accuracy = 83.37%). Hyperparameter tunning improved the performance of all the algorithms. Feature importance analysis revealed that the factors that impacted the student’s success most were the average grade, master time, and the gap between degrees, i.e., the number of years between the last degree and the start of the master.A previsão do sucesso académico de estudantes é um dos tópicos relacionados com a mineração de dados educacionais, a qual pretende extrair informação útil e encontrar padrões a partir de dados académicos. Compreender que fatores afetam o sucesso dos estudantes pode ajudar, as instituições de educação, no desenvolvimento de métodos pedagógicos, dando uma ferramenta de feedback e aconselhamento personalizado. Com o fim de melhorar o desempenho académico dos estudantes e criar uma solução de apoio à decisão, para instituições de ensino superior, este artigo propõe uma metodologia que usa mineração de dados para comparar modelos de previsão para o sucesso dos alunos. Os dados pertencem a alunos de mestrado que frequentaram o ISCTE, uma universidade portuguesa, durante os anos letivos de 2012 a 2022. Além disso, foram estudados quais os fatores que mais afetam o sucesso do aluno. Os vários algoritmos foram comparados pela biblioteca PyCaret. Alguns dos fatores que foram propostos como relevantes para o sucesso incluem, o género do aluno, a formação educacional anterior, a existência de um estatuto especial e o grau de escolaridade dos pais. A análise dos resultados demonstrou que o classificador Light Gradient Boosting Machine (LGBMC) é o que tem o melhor desempenho com uma accuracy de 87.37%, seguindo-se o classificador Gradient Boosting Classifier (accuracy=85.11%) e o classificador Adaptive Boosting (accuracy=83.37%). A afinação de hiperparâmetros melhorou o desempenho de todos os algoritmos. As variáveis que demonstraram ter maior impacto foram a média dos estudantes, a duração do mestrado e o intervalo entre estudos

    Who Stays and Who Leaves? Predicting College Student Persistence Using Comprehensive Retention Models

    Get PDF
    The purpose of this study was to use a comprehensive framework to examine academic, psychosocial, noncognitive, and other background factors that are related to retention at a large, public four-year institution in the southeastern United States. Specifically, the study examined what factors are most important in predicting first-to-second year retention both before the student enrolls at the university and after completion of their first semester of coursework. Data were drawn from institutional records, a survey instrument designed to measure psychosocial constructs, the ACT student record, and the National Center for Education Statistics. The sample for the study consisted of 12,342 students. Hierarchical generalized linear models and ensemble tree-based methods were utilized to identify important predictors of retention, ascertain the nature of the significant relationships, and to build models for predicting retention outcomes. An initial model was built for prediction before students enrolled followed by a second model with first semester performance variables added. Predictive validity was assessed by splitting the sample into a training and test set. Findings from the study showed that nontraditional factors were significant predictors of retention along with traditional predictors such as high school GPA. The results showed that the influence of financial factors and high school characteristics were among the most significant predictors of retention. Moreover, the results showed that multiple psychosocial factors are influential variables in retention outcomes. This study demonstrated that considering a variety of factors when forecasting postsecondary retention outcomes is vital for more accurate predictions. The models in this study showed that pre-college predictive models have the potential to be nearly as effective as models incorporating college performance and activity. The results of this study have important implications for higher education policymakers, college administrators, and high schools. Several of the relationships revealed have significant policy implications related to budget concerns, university programming, and college preparatory initiatives at the high school level. The study also provides a useful model for identifying students at risk of not being retained that could be adapted for implementation at other institutions and points the importance of a holistic understanding of the total student

    Data Mining for Student Performance Prediction in Education

    Get PDF
    The ability to predict the performance tendency of students is very important to improve their teaching skills. It has become a valuable knowledge that can be used for different purposes; for example, a strategic plan can be applied for the development of a quality education. This paper proposes the application of data mining techniques to predict the final grades of students based on their historical data. In the experimental studies, three well-known data mining techniques (decision tree, random forest, and naive Bayes) were employed on two educational datasets related to mathematics lesson and Portuguese language lesson. The results showed the effectiveness of data mining learning techniques when predicting the performances of students
    corecore