4,386 research outputs found

    A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

    Get PDF
    Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—oversampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates

    Socioeconomic Data Mining and Student Dropout: Analyzing a Higher Education Course in Brazil

    Get PDF
    This paper aims to analyze the student dropout from a higher education course, in the city of Guarapari, Espírito Santo, Brazil, through the use of the computational tool known as data mining. The objective was to investigate the possible scenarios for the early identification of students with higher risk of dropping out by analyzing socioeconomic data from business school graduates between 2014 and 2018 with the use of information extracted from the academic system. The methodology used was the experimental research, from a quantitative approach through a comparative analysis of data resulting from the processing of computational algorithms. After the analysis, it was concluded that computational techniques can be used to help administrators to plan pedagogical and administrative actions and that the combination of socioeconomic data with school performance information, using the tool, can yield advantageous results, allowing the fight against evasion to be seen as an early and continuous practice

    Open Issues, Research Challenges, and Survey on Education Sector in India and Exploring Machine Learning Algorithm to Mitigate These Challenges

    Get PDF
    The nation's core sector is education. But dealing with problems in educational institutions, particularly in higher education, is a challenging task. The growth of education and technology has led to a number of research challenges that have attracted significant attention as well as a notable increase in the amount of data available in academic databases. Higher education institutions today are worried about outcome-based education and various techniques to assess a student's knowledge level or capacity for learning. In general, there are more contributors in the academic field than there are authors. Research is being done in this field to determine the best algorithm and features that are crucial for predicting the future outcomes. This survey can help educational institutions assess themselves and find any gaps that need to be filled in order to fulfil their purpose and vision. Machine Learning (ML) approaches have been explored to solve the issues as higher education systems have grown in size

    Empirical Methods for Predicting Student Retention- A Summary from the Literature

    Get PDF
    The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing and the confirmation or empirical validation of theoretical retention models (Herzog, 2005; Ronco and Cahill, 2006; Stratton et al 2008) vs. research specifically focused on the development of applied predictive models (Miller, 2007; Miller & Herreid, 2008; Herzog, 2006; Dey & Astin, 1993; Delen 2010; Yu et al, 2010). Literature indicates that data mining or algorithmic approaches to prediction can provide superior results vis-à-vis traditional statistical modeling approaches (Delen et al, 2004; Sharda and Delen, 2006; Delen et al, 2007; Kiang 2007; Li et al 2009). However, little research in higher education has focused on the employment of data mining methods for predicting retention (Herzog, 2006)

    Reconciling Contemporary Approaches to School Attendance and School Absenteeism: Toward Promotion and Nimble Response, Global Policy Review and Implementation, and Future Adaptability (Part 1)

    Get PDF
    School attendance is an important foundational competency for children and adolescents, and school absenteeism has been linked to myriad short- and long-term negative consequences, even into adulthood. Many efforts have been made to conceptualize and address this population across various categories and dimensions of functioning and across multiple disciplines, resulting in both a rich literature base and a splintered view regarding this population. This article (Part 1 of 2) reviews and critiques key categorical and dimensional approaches to conceptualizing school attendance and school absenteeism, with an eye toward reconciling these approaches (Part 2 of 2) to develop a roadmap for preventative and intervention strategies, early warning systems and nimble response, global policy review, dissemination and implementation, and adaptations to future changes in education and technology. This article sets the stage for a discussion of a multidimensional, multi-tiered system of supports pyramid model as a heuristic framework for conceptualizing the manifold aspects of school attendance and school absenteeism

    A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining

    Full text link
    Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, undesirable student detecting, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. Finally, we point out emerging trends and future directions in this research area.Comment: 21 pages, 5 figure

    Towards the Grade’s Prediction. A Study of Different Machine Learning Approaches to Predict Grades from Student Interaction Data

    Get PDF
    There is currently an open problem within the field of Artificial Intelligence applied to the educational field, which is the prediction of students’ grades. This problem aims to predict early school failure and dropout, and to determine the well-founded analysis of student performance for the improvement of educational quality. This document deals the problem of predicting grades of UNIR university master’s degree students in the on-line mode, proposing a working model and comparing different technologies to determine which one fits best with the available data set. In order to make the predictions, the dataset was submitted to a cleaning and analysis phases, being prepared for the use of Machine Learning algorithms, such as Naive Bayes, Decision Tree, Random Forest and Neural Networks. A comparison is made that addresses a double prediction on a homogeneous set of input data, predicting the final grade per subject and the final master’s degree grade. The results were obtained demonstrate that the use of these techniques makes possible the grade predictions. The data gives some figures in which we can see how Artificial Intelligence is able to predict situations with an accuracy above 96%

    Prediction of students’ performance in e-learning environment using random forest

    Get PDF
    The need of advancement in e-learning technology causes educational data to become very huge and increase very rapidly. The data is generated daily as a result of students interaction with e-learning environment, especially learning management systems. The data contain hidden information about the participation of students in various activities of e-learning which when revealed can be used to associate with the students performance. Predicting the performance of students based on the use of e-learning system in educational institutions is a major concern and has become very important for education managements to better understand why so many students perform poorly or even fail in their studies. However, it is difficult to do the prediction due to the diverse factors or characteristics that influence their performance. This dissertation is aimed at predicting students performance by considering the students interaction in e-learning environment, their assessment marks and their prerequisite knowledge as prediction features. Random Forest algorithm, which is an ensemble of decision trees, has been used for prediction and the comparative analysis shows that the algorithm outperforms the popular decision tree and K-Nearest Neighbor algorithms. However, Naive Bayes outperformed Random Forest. In addition to the performance prediction, Random Forest was also used to identify the significant attributes that influence students performance, which was validated by a statistical test using Pearson correlation. The research therefore, revealed that lab task, assignments, midterm and prerequisite knowledge are significant indicators of students performance predictions

    Educational data mining for tutoring support in Higher Education: a web-based tool case study in engineering degrees

    Get PDF
    This paper presents a web-based software tool for tutoring support of engineering students without any need of data scientist background for usage. This tool is focused on the analysis of students' performance, in terms of the observable scores and of the completion of their studies. For that purpose, it uses a data set that only contains features typically gathered by university administrations about the students, degrees and subjects. The web-based tool provides access to results from different analyses. Clustering and visualization in a low-dimensional representation of students' data help an analyst to discover patterns. The coordinated visualization of aggregated students' performance into histograms, which are automatically updated subject to custom filters set interactively by an analyst, can be used to facilitate the validation of hypotheses about a set of students. Classification of students already graduated over three performance levels using exploratory variables and early performance information is used to understand the degree of course-dependency of students' behavior at different degrees. The analysis of the impact of the student's explanatory variables and early performance in the graduation probability can lead to a better understanding of the causes of dropout. Preliminary experiments on data of the engineering students from the 6 institutions associated to this project were used to define the final implementation of the web-based tool. Preliminary results for classification and drop-out were acceptable since accuracies were higher than 90% in some cases. The usefulness of the tool is discussed with respect to the stated goals, showing its potential for the support of early profiling of students. Real data from engineering degrees of EU Higher Education institutions show the potential of the tool for managing high education and validate its applicability on real scenarios.This work was supported by the Erasmus+ Key Action 2 Strategic Partnerships KA203, funded by the European Commission, under Grant 2016-1-ES01-KA203-025452.info:eu-repo/semantics/publishedVersio
    corecore