14 research outputs found

    Learning Analytics and Deep Learning in Large Virtual Learning Environments (VLEs)

    Get PDF
    In this paper we look at the use of Deep Learning as a technique for Education Data Mining and Learnng Analytics. We discuss existing approaches and how Deep Learning can be used in a complimentary manner in order to provide new and insightful perspectives to existing Learning Analytics Tools and Machine Learning Algorithms. The paper first outlines the context, before considering the use of Big Data. A case study of a Large Virtual Learning Environment (VLE) is introduced. The paper presents a series of Deep Learning Experiments with this Data Set and the new insights this has led to. The paper concludes with a discussion of how this approach compliments other Learning Analytic work in a similar context

    ProcK: Machine Learning for Knowledge-Intensive Processes

    Full text link
    We present a novel methodology to build powerful predictive process models. Our method, denoted ProcK (Process & Knowledge), relies not only on sequential input data in the form of event logs, but can learn to use a knowledge graph to incorporate information about the attribute values of the events and their mutual relationships. The idea is realized by mapping event attributes to nodes of a knowledge graph and training a sequence model alongside a graph neural network in an end-to-end fashion. This hybrid approach substantially enhances the flexibility and applicability of predictive process monitoring, as both the static and dynamic information residing in the databases of organizations can be directly taken as input data. We demonstrate the potential of ProcK by applying it to a number of predictive process monitoring tasks, including tasks with knowledge graphs available as well as an existing process monitoring benchmark where no such graph is given. The experiments provide evidence that our methodology achieves state-of-the-art performance and improves predictive power when a knowledge graph is available

    The application of Machine Learning for Early Detection of At -Risk Learners in Massive Open Online Courses

    Get PDF
    With the rapid improvement of digital technology, Massive Open Online Courses (MOOCs) have emerged as powerful open educational learning platforms. MOOCs have been experiencing increased use and popularity in highly ranked universities in recent years. The opportunity to access high-quality courseware content within such platforms, while eliminating the burden of educational, financial and geographical obstacles has led to a growth in participant numbers. Despite the increasing participation in online courses, the low completion rate has raised major concerns in the literature. Identifying those students who are at-risk of dropping out could be a promising solution in solving the low completion rate in the online setting. Flagging at-risk students could assist the course instructors to bolster the struggling students and provide more learning resources. Although many prior studies have considered the dropout issue in the form of a sequence classification problem, such works only address a limited set of retention factors. They typically consider the learners’ activities as a sequence of weekly intervals, neglecting important learning trajectories. In this PhD thesis, my goal is to investigate retention factors. More specifically, the project seeks to explore the association of motivational trajectories, performance trajectories, engagement levels and latent engagement with the withdrawal rate. To achieve this goal, the first objective is to derive learners’ motivations based on Incentive Motivation theory. The Learning Analytic is utilised to classify student motivation into three main categories; Intrinsically motivated, Extrinsically motivated and Amotivation. Machine learning has been employed to detect the lack of motivation at early stages of the courses. The findings reveal that machine learning provides solutions that are capable of automatically identifying the students’ motivational status according to behaviourism theory. As the second and third objectives, three temporal dropout prediction models are proposed in this research work. The models provide dynamic assessment of the influence of the following factors; motivational trajectories, performance trajectories and latent engagement on students and the subsequent risk of them leaving the course. The models could assist the instructor in delivering more intensive intervention support to at-risk students. Supervised machine learning algorithms have been utilised in each model to identify the students who are in danger of dropping out in a timely manner. The results demonstrate that motivational trajectories and engagement levels are significant factors, which might influence the students’ withdrawal in online settings. On the other hand, the findings indicate that performance trajectories and latent engagement might not prevent students from completing online courses

    Predicting Student Performance on Virtual Learning Environment

    Get PDF
    Virtual learning has gained increased importance because of the recent pandemic situation. A mass shift to virtual means of education delivery has been observed over the past couple of years, forcing the community to develop efficient performance assessment tools. Prediction of students performance using different relevant information has emerged as an efficient tool in educational institutes towards improving the curriculum and teaching methodologies. Automated analysis of educational data using state of the art Machine Learning (ML) and Artificial Intelligence (AI) algorithms is an active area of research. The research presented in this thesis addresses the problem of students performance prediction comprehensively by applying multiple machine learning models (i.e., Multilayer Perceptron (MLP), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), CATBoost, K-Nearest Neighbour (KNN) and Support Vector Classifier (SVC)) on the two benchmark VLE datasets (i.e., Open University Learning Analytics Dataset (OULAD), Coursera). In this context, a series of experiments are performed and important insights are reported. First, the classification performance of machine learning models has been investigated on both OULAD and Coursera datasets. In the second experiment, performance of machine learning models is studied for each course of Coursera dataset and comparative analysis are performed. From the Experiment 1 and Experiment 2, the class imbalance is reported as the highlighted factor responsible for degraded performance of machine learning models. In this context, Experiment 3 is designed to address the class imbalance problem by making use of multiple Synthetic Minority Oversampling Technique (SMOTE) and generative models (i.e., Generative Adversial Networks (GANs)). From the results, SMOTE NN approach was able to achieve best classification performance among the implemented SMOTE techniques. Further, when mixed with generative models, the SMOTENN-GAN generated Coursera dataset was the best on which machine learning models were able to achieve the classification accuracy around 90%. Overall, MLP, XGBoost and CATBoost machine learning models were emerged as the best performing in context to different experiments performed in this thesis

    Fairness-aware Machine Learning in Educational Data Mining

    Get PDF
    Fairness is an essential requirement of every educational system, which is reflected in a variety of educational activities. With the extensive use of Artificial Intelligence (AI) and Machine Learning (ML) techniques in education, researchers and educators can analyze educational (big) data and propose new (technical) methods in order to support teachers, students, or administrators of (online) learning systems in the organization of teaching and learning. Educational data mining (EDM) is the result of the application and development of data mining (DM), and ML techniques to deal with educational problems, such as student performance prediction and student grouping. However, ML-based decisions in education can be based on protected attributes, such as race or gender, leading to discrimination of individual students or subgroups of students. Therefore, ensuring fairness in ML models also contributes to equity in educational systems. On the other hand, bias can also appear in the data obtained from learning environments. Hence, bias-aware exploratory educational data analysis is important to support unbiased decision-making in EDM. In this thesis, we address the aforementioned issues and propose methods that mitigate discriminatory outcomes of ML algorithms in EDM tasks. Specifically, we make the following contributions: We perform bias-aware exploratory analysis of educational datasets using Bayesian networks to identify the relationships among attributes in order to understand bias in the datasets. We focus the exploratory data analysis on features having a direct or indirect relationship with the protected attributes w.r.t. prediction outcomes. We perform a comprehensive evaluation of the sufficiency of various group fairness measures in predictive models for student performance prediction problems. A variety of experiments on various educational datasets with different fairness measures are performed to provide users with a broad view of unfairness from diverse aspects. We deal with the student grouping problem in collaborative learning. We introduce the fair-capacitated clustering problem that takes into account cluster fairness and cluster cardinalities. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain fair-capacitated clustering. We introduce the multi-fair capacitated (MFC) students-topics grouping problem that satisfies students' preferences while ensuring balanced group cardinalities and maximizing the diversity of members regarding the protected attribute. We propose three approaches: a greedy heuristic approach, a knapsack-based approach using vanilla maximal 0-1 knapsack formulation, and an MFC knapsack approach based on group fairness knapsack formulation. In short, the findings described in this thesis demonstrate the importance of fairness-aware ML in educational settings. We show that bias-aware data analysis, fairness measures, and fairness-aware ML models are essential aspects to ensure fairness in EDM and the educational environment.Ministry of Science and Culture of Lower Saxony/LernMINT/51410078/E

    How Machine Learning (ML) is Transforming Higher Education: A Systematic Literature Review

    Get PDF
    In the last decade, artificial intelligence (AI), machine learning (ML) and learning data analytics have been introduced with great effect in the field of higher education. However, despite the potential benefits for higher education institutions (HIE´s) of these emerging technologies, most of them are still in the early stages of adoption of these technologies. Thus, a systematic literature review (SLR) on the literature published over the last 5 years on potential applications of machine learning in higher education is necessary. Following the PRISMA guidelines, out of the 1887 initially identified SCOPUS-indexed publications on the topic, 171 articles were selected for review. To screen the abstracts and titles of each citation, Rayyan QCRI was used. VOSViewer, a software tool for constructing and visualizing bibliometric networks, and Microsoft Excel were used to generate charts and figures. The findings show that the most widely researched application of ML in higher education is related to the prediction of academic performance and employability of students. The implications will be invaluable for researchers and practitioners to explore how ML and AI technologies ,in the era of ChatGPT, can be used in universities without jeopardizing academic integrity.info:eu-repo/semantics/publishedVersio

    Detecting At-Risk Students with Early Interventions Using Machine Learning Techniques

    Get PDF
    Massive Open Online Courses (MOOCs) have shown rapid development in recent years, allowing learners to access high-quality digital material. Because of facilitated learning and the flexibility of the teaching environment, the number of participants is rapidly growing. However, extensive research reports that the high attrition rate and low completion rate are major concerns. In this paper, the early identification of students who are at risk of withdrew and failure is provided. Therefore, two models are constructed namely at-risk student model and learning achievement model. The models have the potential to detect the students who are in danger of failing and withdrawal at the early stage of the online course. The result reveals that all classifiers gain good accuracy across both models, the highest performance yield by GBM with the value of 0.894, 0.952 for first, second model respectively, while RF yield the value of 0.866, in at-risk student framework achieved the lowest accuracy. The proposed frameworks can be used to assist instructors in delivering intensive intervention support to at-risk students

    Artificial Intelligence methodologies to early predict student outcome and enrich learning material

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    A Comprehensive Exploration of Personalized Learning in Smart Education: From Student Modeling to Personalized Recommendations

    Full text link
    With the development of artificial intelligence, personalized learning has attracted much attention as an integral part of intelligent education. China, the United States, the European Union, and others have put forward the importance of personalized learning in recent years, emphasizing the realization of the organic combination of large-scale education and personalized training. The development of a personalized learning system oriented to learners' preferences and suited to learners' needs should be accelerated. This review provides a comprehensive analysis of the current situation of personalized learning and its key role in education. It discusses the research on personalized learning from multiple perspectives, combining definitions, goals, and related educational theories to provide an in-depth understanding of personalized learning from an educational perspective, analyzing the implications of different theories on personalized learning, and highlighting the potential of personalized learning to meet the needs of individuals and to enhance their abilities. Data applications and assessment indicators in personalized learning are described in detail, providing a solid data foundation and evaluation system for subsequent research. Meanwhile, we start from both student modeling and recommendation algorithms and deeply analyze the cognitive and non-cognitive perspectives and the contribution of personalized recommendations to personalized learning. Finally, we explore the challenges and future trajectories of personalized learning. This review provides a multidimensional analysis of personalized learning through a more comprehensive study, providing academics and practitioners with cutting-edge explorations to promote continuous progress in the field of personalized learning.Comment: 82 pages,5 figure

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7
    corecore