618 research outputs found

    Student performance prediction based on data mining classification techniques

    Get PDF
    The process of predicting student performance has become a crucial factor in academic environment and plays significant role in producing quality graduates. Several statistical and machine learning algorithms have been proposed for analyzing, predicting and classifying student performance. However, these classification algorithms still posed issue in terms of the performance classification. This paper presents a method to predict student performance using Iterative dichotomiser 3 (ID3), C4.5 and Classification and Regression tree (CART). The experiment was performed on Waikato Environment for Knowledge Analysis (Weka). The experimental results showed that an ID3 accuracy of 95.9% , specificity of 95.9%, precision of 95.9%, recall of 95.9%, f-measure of 95.9% and incorrectly classified instance of 3.83. The C4.5 gave an accuracy of 98.3%, specificity of 98.3%, precision of 98.4%, recall of 98.3%, f-measure of 98.3% and incorrectly classified instance of 1.70. The CART results showed an accuracy of 98.3%, specificity of 98.3%, precision of 98.4%, recall of 98.3%, f-measure of 98.3% and incorrectly classified instance of 1.70. The time taken to build the model of ID3 is 0.05 seconds, C4.5 is 0.03 seconds and CART of 0.58 seconds. Experimental results revealed that C4.5 outperforms other classifiers and requires reasonable amount of time to build the model.Keywords: Student performance, ID3, C4.5, CART, classification, Education data minin

    PSAP: Improving Accuracy of Students' Final Grade Prediction using ID3 and C4.5

    Get PDF
    This study was aimed to increase the performance of the Predicting Student Academic Performance (PSAP) system, and the outcome is to develop a web application that can be used to analyze student performance during present semester. Development of the web-based application was based on the evolutionary prototyping model. The study also analyses the accuracy of the classifier that is constructed for the prediction features in the web application. Qualitative approaches by user evaluation questionnaire were used for this study. A number of few personnel expert users which are lecturers from Universiti Pendidikan Sultan Idris were chosen as respondents. Each respondent is instructed to answer a total of 27 questions regarding respondentā€™s background and web application design. The accuracy of the classifier for the prediction features is tested by using the confusion matrix by using the test set of 24 rows. The findings showed the views of respondents on the aspects of interface design, functionality, navigation, and reliability of the web-based application that is developed. The result also showed that accuracy for the classifier constructed by using ID3 classification model (C4.5) is 79.18% and the highest compared to NaĆÆve Bayes and Generalized Linear classification model

    A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition

    Get PDF
    Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniquesā€”oversampling, under-sampling and synthetic minority over-sampling (SMOTE)ā€”along with four popular classification methodsā€”logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with SMOTE data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates
    • ā€¦
    corecore