6,445 research outputs found

    A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

    Full text link
    Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap with arXiv:1403.0623 by other author

    Explainable AI (XAI): Improving At-Risk Student Prediction with Theory-Guided Data Science, K-means Classification, and Genetic Programming

    Get PDF
    This research explores the use of eXplainable Artificial Intelligence (XAI) in Educational Data Mining (EDM) to improve the performance and explainability of artificial intelligence (AI) and machine learning (ML) models predicting at-risk students. Explainable predictions provide students and educators with more insight into at-risk indicators and causes, which facilitates instructional intervention guidance. Historically, low student retention has been prevalent across the globe as nations have implemented a wide range of interventions (e.g., policies, funding, and academic strategies) with only minimal improvements in recent years. In the US, recent attrition rates indicate two out of five first-time freshman students will not graduate from the same four-year institution within six years. In response, emerging AI research leveraging recent advancements in Deep Learning has demonstrated high predictive accuracy for identifying at-risk students, which is useful for planning instructional interventions. However, research suggested a general trade-off between performance and explainability of predictive models. Those that outperform, such as deep neural networks (DNN), are highly complex and considered black boxes (i.e., systems that are difficult to explain, interpret, and understand). The lack of model transparency/explainability results in shallow predictions with limited feedback prohibiting useful intervention guidance. Furthermore, concerns for trust and ethical use are raised for decision-making applications that involve humans, such as health, safety, and education. To address low student retention and the lack of interpretable models, this research explored the use of eXplainable Artificial Intelligence (XAI) in Educational Data Mining (EDM) to improve instruction and learning. More specifically, XAI has the potential to enhance the performance and explainability of AI/ML models predicting at-risk students. The scope of this study includes a hybrid research design comprising: (1) a systematic literature review of XAI and EDM applications in education; (2) the development of a theory-guided feature selection (TGFS) conceptual learning model; and (3) an EDM study exploring the efficacy of a TGFS XAI model. The EDM study implemented K-Means Classification for explorative (unsupervised) and predictive (supervised) analysis in addition to assessing Genetic Programming (GP), a type of XAI model, predictive performance, and explainability against common AI/ML models. Online student activity and performance data were collected from a learning management system (LMS) from a four-year higher education institution. Student data was anonymized and protected to ensure data privacy and security. Data was aggregated at weekly intervals to compute and assess the predictive performance (sensitivity, recall, and f-1 score) over time. Mean differences and effect sizes are reported at the .05 significance level. Reliability and validity are improved by implementing research best practices

    Predicting Academic Performance: A Systematic Literature Review

    Get PDF
    The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

    Using Data Mining for Predicting Relationships between Online Question Theme and Final Grade

    Get PDF
    As higher education diversifies its delivery modes, our ability to use the predictive and analytical power of educational data mining (EDM) to understand students\u27 learning experiences is a critical step forward. The adoption of EDM by higher education as an analytical and decision making tool is offering new opportunities to exploit the untapped data generated by various student information systems (SIS) and learning management systems (LMS). This paper describes a hybrid approach which uses EDM and regression analysis to analyse live video streaming (LVS) students\u27 online learning behaviours and their performance in their courses. Students\u27 participation and login frequency, as well as the number of chat messages and questions that they submit to their instructors, were analysed, along with students\u27 final grades. Results of the study show a considerable variability in students\u27 questions and chat messages. Unlike previous studies, this study suggests no correlation between students\u27 number of questions/chat messages/login times and students\u27 success. However, our case study reveals that combining EDM with traditional statistical analysis provides a strong and coherent analytical framework capable of enabling a deeper and richer understanding of students\u27 learning behaviours and experiences

    Unfolding the drivers for academic success: The case of ISCTE-IUL

    Get PDF
    Predicting the success of academic students is a major topic in the higher education research community. This study presents a data mining approach to predict academic success in a Portuguese University called ISCTE-IUL, unveiling the features that better explain failures. A dataset of 10 curricular years for bachelor’s degrees has been analysed. Features’ selection resulted in a characterising set of 68 features, encompassing socio-demographic, social origin, previous education, special statutes and educational path information. Understanding features’ collection timings, distinct predicting was conducted. Based on entrance date, end of the first and the second curricular semesters, three distinct data models were proposed and tested. An additional model was designed for outlier degrees (i.e., a 4-year Bachelor). Six algorithms were tested for modelling. A support vector machines (SVM) model achieved the best overall performance and was selected to conduct a data-based sensitivity analysis. Relevance and impact review allowed extracting meaningful knowledge. This approach unfolded that previous evaluation performance, study gaps and age-related features play a major role in explaining failures at entrance stage. For subsequent stages, current evaluation performance features unveil their predicting power. Also, it should be noted that most of the features’ groups are represented on each model’s most relevant features, revealing that academic success is a combination of a wide range of distinct factors. These and many other findings, such as, age-related features increasing impact at the end first curricular semester, set a baseline for success improvement recommendations, and for easier data mining adoption by Higher Education institutions. Suggested guidelines include to provide study support groups to risk profiles and to create monitoring frameworks. From a practical standpoint, a data-driven decision-making framework based on these models can be used to promote academic success.O sucesso académico é um dos tópicos mais explorados nos estudos sobre o ensino superior. Este trabalho apresenta uma abordagem de data mining para a previsão do sucesso académico no ISCTE-IUL. Numa abordagem focada no insucesso, são estudados os fatores que explicam estes casos. Neste estudo foram utilizados dados de licenciatura de 10 anos curriculares. Foram analisadas 68 características sociodemográficas, origem social, percurso escolar anterior (ensino secundário), estatutos especiais e percurso académico. Foram adotados diferentes vetores de análise para o primeiro ano curricular (entrada e final dos primeiro e segundo semestres curriculares), dando origem a 3 modelos distintos. Um modelo suplementar foi projetado para cursos especiais. Entre os seis algoritmos de modelação testados, SVM obteve a melhor performance, sendo utilizado para a análise de sensibilidade. O processo de extração de conhecimento indicou que fatores como desempenho anterior, interrupções do percurso educacional e idade, demonstram grande impacto no (in)sucesso num estágio inicial. Nos estágios seguintes, fatores de performance atuais revelam um grande poder de previsão do (in)sucesso. A maior parte dos grupos de características faz-se representar, nas características mais relevantes de cada modelo. Estes e outros resultados, como o aumento do impacto dos fatores relacionadas com a idade no final do segundo semestre curricular, suportam a criação de recomendações institucionais. Por exemplo, criar grupos de apoio ao estudo para perfis de risco e criar ferramentas de monitorização são algumas das diretrizes sugeridas. Em suma, é possível criar uma ferramenta de apoio à decisão, baseada nos modelos apresentados, podendo ser utilizada pelo ISCTE-IUL para promover o sucesso académico

    Design and Implementation of Real-time Student Performance Evaluation and Feedback System

    Get PDF
    Undergraduate education is challenged by high dropout rates and by delayed student graduation due to dropping courses or having to repeat courses due to low academic performance. In this context, an early prediction of student-performance may help students to understand where they stand amongst their peers and to change the attitude with about the course they are taking. Moreover, it is important to identify students in time who need special attention and providing appropriate interventions, such as mentoring and conducting review sessions. The goal of this thesis is the design and implementation of real-time student-performance evaluation and feedback system (RSPEF) to improve graduation rates. RSPEF is an interactive, web-based system consisting of a Predictive Analysis System (PAS) that uses machine-learning techniques to interpolate past student-performance into future, and the development of an Emergency Warning System (EWS) that identifies poor-performing students in courses. Moreover, a unified representation of student-background and student-performance data is provided in form of a relational database schema that is suitable to be used to assess student’s performance across multiple courses, which is critical for the generalizability of RSPEF system. The system design includes core machine-learning & data-analysis engine, a relational database that is reusable across courses and an interactive web-based interface to continuously collect data and create dashboards for users.Computer Science, Department o

    Improving the expressiveness of black-box models for predicting student performance

    Get PDF
    Early prediction systems of student performance can be very useful to guide student learning. For a prediction model to be really useful as an effective aid for learning, it must provide tools to adequately interpret progress, to detect trends and behaviour patterns and to identify the causes of learning problems. White-box and black-box techniques have been described in literature to implement prediction models. White-box techniques require a priori models to explore, which make them easy to interpret but difficult to be generalized and unable to detect unexpected relationships between data. Black-box techniques are easier to generalize and suitable to discover unsuspected relationships but they are cryptic and difficult to be interpreted for most teachers. In this paper a black-box technique is proposed to take advantage of the power and versatility of these methods, while making some decisions about the input data and design of the classifier that provide a rich output data set. A set of graphical tools is also proposed to exploit the output information and provide a meaningful guide to teachers and students. From our experience, a set of tips about how to design a prediction system and the representation of the output information is also provided
    corecore