23,118 research outputs found

    Development of a system architecture for the prediction of student success using machine learning techniques

    Get PDF
    “ The goals of higher education have evolved through time based on the impact that technology development and industry have on productivity. Nowadays, jobs demand increased technical skills, and the supply of prepared personnel to assume those jobs is insufficient. The system of higher education needs to evaluate their practices to realize the potential of cultivating an educated and technically skilled workforce. Currently, completion rates at universities are too low to accomplish the aim of closing the workforce gap. Recent reports indicate that 40 percent of freshman at four-year public colleges will not graduate, and rates of completion are even lower for community colleges. Some efforts have been made to adjust admission requirements and develop systems of support for different segments of students; however, completion rates are still considered low. Therefore, new strategies need to consider student success as part of the institutional culture based on the information technology support. Also, it is key that the models that evaluate student success can be scalable to other higher education institutions. In recent years machine learning techniques have proven to be effective for such purpose. Then, the primary objective of this research is to develop an integrated system that allows for the application of machine learning for student success prediction. The proposed system was evaluated to determine the accuracy of student success predictions using several machine learning techniques such as decision trees, neural networks, support vector machines, and random forest. The research outcomes offer an important understanding about how to develop a more efficient and responsive system to support students to complete their educational goals”--Abstract, page iv

    Predicting general academic performance and identifying the differential contribution of participating variables using artificial neural networks

    Get PDF
    oai:flr.journals.publicknowledgeproject.org:article/13Many studies have explored the contribution of different factors from diverse theoretical perspectives to the explanation of academic performance. These factors have been identified as having important implications not only for the study of learning processes, but also as tools for improving curriculum designs, tutorial systems, and students’ outcomes. Some authors have suggested that traditional statistical methods do not always yield accurate predictions and/or classifications (Everson, 1995; Garson, 1998). This paper explores a relatively new methodological approach for the field of learning and education, but which is widely used in other areas, such as computational sciences, engineering and economics. This study uses cognitive and non-cognitive measures of students, together with background information, in order to design predictive models of student performance using artificial neural networks (ANN). These predictions of performance constitute a true predictive classification of academic performance over time, a year in advance of the actual observed measure of academic performance. A total sample of 864 university students of both genders, ages ranging between 18 and 25 was used. Three neural network models were developed. Two of the models (identifying the top 33% and the lowest 33% groups, respectively) were able to reach 100% correct identification of all students in each of the two groups. The third model (identifying low, mid and high performance levels) reached precisions from 87% to 100% for the three groups. Analyses also explored the predicted outcomes at an individual level, and their correlations with the observed results, as a continuous variable for the whole group of students. Results demonstrate the greater accuracy of the ANN compared to traditional methods such as discriminant analyses.  In addition, the ANN provided information on those predictors that best explained the different levels of expected performance. Thus, results have allowed the identification of the specific influence of each pattern of variables on different levels of academic performance, providing a better understanding of the variables with the greatest impact on individual learning processes, and of those factors that best explain these processes for different academic levels

    CS1: how will they do? How can we help? A decade of research and practice

    Get PDF
    Background and Context: Computer Science attrition rates (in the western world) are very concerning, with a large number of students failing to progress each year. It is well acknowledged that a significant factor of this attrition, is the students’ difficulty to master the introductory programming module, often referred to as CS1. Objective: The objective of this article is to describe the evolution of a prediction model named PreSS (Predict Student Success) over a 13-year period (2005–2018). Method: This article ties together, the PreSS prediction model; pilot studies; a longitudinal, multi-institutional re-validation and replication study; improvements to the model since its inception; and interventions to reduce attrition rates. Findings: The outcome of this body of work is an end-to-end real-time web-based tool (PreSS#), which can predict student success early in an introductory programming module (CS1), with an accuracy of 71%. This tool is enhanced with interventions that were developed in conjunction with PreSS#, which improved student performance in CS1. Implications: This work contributes significantly to the computer science education (CSEd) community and the ITiCSE 2015 working group’s call (in particular the second grand challenge), by re-validating and developing further the original PreSS model, 13 years after it was developed, on a modern, disparate, multi-institutional data set

    Predicting Success of University Applicants Based on Subjects’ Preferences as an Extra Tool for Admission Considerations Predictive Analytics Approach

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study uses a dataset of student performance indicators and psychological patterns associated with each individual to examine the prediction efficiency of psychological traits on academic results, more specifically grade point average (GPA). We propose building a classification machine learning model that predicts GPA performance, dividing the students into the top and bottom performers. Several features were used in the modelling, namely, student's previous performance, such as GPA, course progression (how close the student master is related to previous academic courses), and personality traits obtained by surveying 319 students and recent graduates with a quiz developed by Association Better Future based on the RIASEC model for type theory of personality. It is widely accepted that psychological characteristics can impact student churn and performance (Costa and McCrae, 1992). Furthermore, numerous papers have found that GPA can be predicted by multiple factors, including past performance, intelligence coefficient (IQ), demographic background, previous area of studies, but, to increase the model’s accuracy, psychological factors are recommended for future works (Abele and Spurk, 2009). Whilst past performance and, to a lesser extent, IQ are currently evaluated in university admissions, psychological traits are yet to have a place in selecting the best candidates. In this study we propose that, although IQ and past performance are good indicators of student performance, the predictive power of psychological traits, when combined with these classical indicators, increases the predictability accuracy of the machine learning model. With this in mind, we used the performance of past and current university students, measured in GPA, analysed it against the collected psychological indicators and developed multiple machine learning models to predict the student GPA based on the collected indicators. These were divided into 3 groups: psychological traits only, GPA and age only, and a combination of both. Four types of models were used: neural networks, Support Vector Machines (SVM), decision forests and decision trees. Decision forests, for the problem at hand, consistently outperformed neural networks, SVM and decision trees both in accuracy and Area Under the Curve (AUC), the curve being the Receiver Operating Characteristic (ROC). From the database with 176 entries, comparing the models created with the GPA and age-based dataset with the ones based on the full dataset that includes psychological variables, decision forests were the model with higher fitness to the training model, and with the higher AUC against the validation set, with values of 0.717 and 0.790, respectively. The models based on the full dataset, including psychological variables, consistently outperformed the models based solely on the classical GPA predicting metrics. We further propose and discuss that the model can be used as an extra indicator for the admission process

    Evaluation of Machine Learning Techniques for Early Identification of At-Risk Students

    Get PDF
    Student attrition is one of the long-standing problems facing higher education institutions despite the extensive research that has been undertaken to address it. To increase students’ success and retention rates, there is a need for early alert systems that facilitate the identification of at-risk students so that remedial measures may be taken in time to reduce the risk. However, incorporating ML predictive models into early warning systems face two main challenges: improving the accuracy of timely predictions and the generalizability of predictive models across on-campus and online courses. The goal of this study was to develop and evaluate predictive models that can be applied to on-campus and online courses to predict at-risk students based on data collected from different stages of a course: start of the course, 4th week, 8th week, and 12th week. In this research, several supervised machine learning algorithms were trained and evaluated on their performance. This study compared the performance of single classifiers (Logistic Regression, Decision Trees, Naïve Bayes, and Artificial Neural Networks) and ensemble classifiers (using bagging and boosting techniques). Their performance was evaluated in term of sensitivity, specificity, and Area Under Curve (AUC). A total of four experiments were conducted based on data collected from different stages of the course. In the first experiment, the classification algorithms were trained and evaluated based on data collected before the beginning of the semester. In the second experiment, the classification algorithms were trained and evaluated based on week-four data. Similarly, in the third and fourth experiments, the classification algorithms were trained and evaluated based on week-eight and week-12 data. The results demonstrated that ensemble classifiers were able to achieve the highest classification performance in all experiments. Additionally, the results of the generalizability analysis showed that the predictive models were able to attain a similar performance when used to classify on-campus and online students. Moreover, the Extreme Gradient Boosting (XGBoost) classifier was found to be the best performing classifier suited for the at-risk students’ classification problem and was able to achieve an AUC of ≈ 0.89, a sensitivity of ≈ 0.81, and specificity of ≈ 0.81 using data available at the start of a course. Finally, the XGBoost classifier was able to improve by 1% for each subsequent four weeks dataset reaching an AUC of ≈ 0.92, a sensitivity of ≈ 0.84, and specificity of ≈ 0.84 by week 12. While the additional learning management system\u27s (LMS) data helped in improving the prediction accuracy consistently as the course progresses, the improvement was marginal. Such findings suggest that the predictive models can be used to identify at-risk students even in courses that do not make significant use of LMS. The results of this research demonstrated the usefulness and effectiveness of ML techniques for early identification of at-risk students. Interestingly, it was found that fairly reliable predictions can be made at the start of the semester, which is significant in that help can be provided to at-risk students even before the course starts. Finally, it is hoped that the results of this study advance the understanding of the appropriateness and effectiveness of ML techniques when used for early identification of at-risk students

    A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

    Full text link
    Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap with arXiv:1403.0623 by other author

    Predicting and Improving Performance on Introductory Programming Courses (CS1)

    Get PDF
    This thesis describes a longitudinal study on factors which predict academic success in introductory programming at undergraduate level, including the development of these factors into a fully automated web based system (which predicts students who are at risk of not succeeding early in the introductory programming module) and interventions to address attrition rates on introductory programming courses (CS1). Numerous studies have developed models for predicting success in CS1, however there is little evidence on their ability to generalise or on their use beyond early investigations. In addition, they are seldom followed up with interventions, after struggling students have been identified. The approach overcomes this by providing a web-based real time system, with a prediction model at its core that has been longitudinally developed and revalidated, with recommendations for interventions which educators could implement to support struggling students that have been identified. This thesis makes five fundamental contributions. The first is a revalidation of a prediction model named PreSS. The second contribution is the development of a web-based, real time implementation of the PreSS model, named PreSS#. The third contribution is a large longitudinal, multi-variate, multi-institutional study identifying predictors of performance and analysing machine learning techniques (including deep learning and convolutional neural networks) to further develop the PreSS model. This resulted in a prediction model with approximately 71% accuracy, and over 80% sensitivity, using data from 11 institutions with a sample size of 692 students. The fourth contribution is a study on insights on gender differences in CS1; identifying psychological, background, and performance differences between male and female students to better inform the prediction model and the interventions. The final, fifth contribution, is the development of two interventions that can be implemented early in CS1, once identified by PreSS# to potentially improve student outcomes. The work described in this thesis builds substantially on earlier work, providing valid and reliable insights on gender differences, potential interventions to improve performance and an unsurpassed, generalizable prediction model, developed into a real time web-based system

    Predicting Student Retention: A Comparative Study of Predictive Models for Predicting Student Retention at St. Cloud State University

    Get PDF
    Student graduation rates has always taken prominence in academic studies since it is considered a major factor to reflect the performance of any university. Accurate models for predicting student retention plays a major role is universities strategic planning and decision making. Students’ enrollment behavior and retention rate are also relevant factors in the measurement of the effectiveness of universities. This thesis provides a comparison of predictive models for predicting student retention at Saint Cloud State University. The models are trained and tested using a set of features reflecting the readiness of students for college education, their academic capacities, financial situation, and academic results during their freshman year. Principle Component Analysis (PCA) were used for feature selection. Six predictive models has been built. A comparison of the prediction results have been conducted using all features and selected features using PCA analysis

    Using Machine Learning Techniques to Predict Introductory Programming Performance

    Get PDF
    Learning to program is difficult and can result in high drop out and failure rates. Numerous research studies have attempted to determine the factors that influence programming success and to develop suitable prediction models. The models built tend to be statistical, with linear regression the most common technique used. Over a three year period a multi-institutional, multivariate study was performed to determine factors that influence programming success. In this paper an investigation of six machine learning algorithms for predicting programming success, using the predetermined factors, is described. Naïve Bayes was found to have the highest prediction accuracy. However, no significant statistical differences were found between the accuracy of this algorithm and logistic regression, SMO (support vector machine), back propagation (artificial neural network) and C4.5 (decision tree). The paper concludes with a recent epilogue study that re-validates the factors and the performance of the naïve Bayes model
    corecore