3,931 research outputs found

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    Predicting Academic Performance: A Systematic Literature Review

    Get PDF
    The ability to predict student performance in a course or program creates opportunities to improve educational outcomes. With effective performance prediction approaches, instructors can allocate resources and instruction more accurately. Research in this area seeks to identify features that can be used to make predictions, to identify algorithms that can improve predictions, and to quantify aspects of student performance. Moreover, research in predicting student performance seeks to determine interrelated features and to identify the underlying reasons why certain features work better than others. This working group report presents a systematic literature review of work in the area of predicting student performance. Our analysis shows a clearly increasing amount of research in this area, as well as an increasing variety of techniques used. At the same time, the review uncovered a number of issues with research quality that drives a need for the community to provide more detailed reporting of methods and results and to increase efforts to validate and replicate work.Peer reviewe

    Impacto de la inteligencia artificial en los métodos de evaluación en la educación primaria y secundaria: revisión sistemática de la literatura

    Get PDF
    The educational sector can be enriched by the incorporation of artificial intelligence (AI) in various aspects. The field of artificial intelligence and its applications in the education sector give rise to a multidisciplinary field that brings together computer science, statistics, psychology and, of course, education. Within this context, this review aimed to synthesise existing research focused on provide improvements on primary/secondary student assessment using some AI tool. Thus, nine original research studies (641 participants), published between 2010 and 2023, met the inclusion criteria defined in this systematic literature review. The main contributions of the application of AI in the assessment of students at these lower educational levels focus on predicting their performance, automating and making evaluations more objective by means of neural networks or natural language processing, the use of educational robots to analyse their learning process, and the detection of specific factors that make classes more attractive. This review shows the possibilities and already existing uses that AI can bring to education, specifically in the evaluation of student performance at the primary and secondary levels.El sector educativo puede enriquecerse con la incorporación de la inteligencia artificial (IA) en diversos aspectos. El campo de la inteligencia artificial y sus aplicaciones en el sector educativo dan lugar a un campo multidisciplinar en el que confluyen la informática, la estadística, la psicología y, por supuesto, la educación. Dentro de este contexto, esta revisión pretende sintetizar las investigaciones existentes centradas en proporcionar mejoras en la evaluación del alumnado de primaria/secundaria utilizando alguna herramienta de IA. Así, nueve estudios de investigación originales (641 participantes), publicados entre 2010 y 2023, cumplen los criterios de inclusión definidos en esta revisión bibliográfica sistemática. Las principales aportaciones de la aplicación de la IA en la evaluación del alumnado de estos niveles educativos inferiores se centran en la predicción de su rendimiento, evaluaciones más objetivas y automatizadas mediante redes neuronales o procesamiento del lenguaje natural, el uso de robots educativos para analizar su proceso de aprendizaje y la detección de factores específicos que hacen más atractivas las clases. Esta revisión muestra las posibilidades y los usos ya existentes que la IA puede aportar a la educación, concretamente en la evaluación del rendimiento del alumnado de primaria y secundaria.Universidade de Vigo/CISUGMinisterio de Universidades | Ref. FPU19/0118

    Systematic mapping review on student’s performance analysis using big data predictive model

    Get PDF
    This paper classify the various existing predicting models that are used for monitoring andimproving students’ performance at schools and higher learning institutions. It analyses all theareas within the educational data mining methodology. Two databases were chosen for thisstudy and a systematic mapping study was performed. Due to the very infant stage of thisresearch area, only 114 articles published from 2012 till 2016 were identified. Within this, atotal of 59 articles were reviewed and classified. There is an increased interest and research inthe area of educational data mining, particularly in improving students’ performance withvarious predictive and prescriptive models. Most of the models are devised for pedagogicalimprovements ultimately. It is a huge scarcity in producing portable predictive models that fitsinto any educational environment. There is more research needed in the educational big data.Keywords: predictive analysis; student’s performance; big data; big data analytics; datamining; systematic mapping study

    A Literature Review on Intelligent Services Applied to Distance Learning

    Get PDF
    Distance learning has assumed a relevant role in the educational scenario. The use of Virtual Learning Environments contributes to obtaining a substantial amount of educational data. In this sense, the analyzed data generate knowledge used by institutions to assist managers and professors in strategic planning and teaching. The discovery of students’ behaviors enables a wide variety of intelligent services for assisting in the learning process. This article presents a literature review in order to identify the intelligent services applied in distance learning. The research covers the period from January 2010 to May 2021. The initial search found 1316 articles, among which 51 were selected for further studies. Considering the selected articles, 33% (17/51) focus on learning systems, 35% (18/51) propose recommendation systems, 26% (13/51) approach predictive systems or models, and 6% (3/51) use assessment tools. This review allowed for the observation that the principal services offered are recommendation systems and learning systems. In these services, the analysis of student profiles stands out to identify patterns of behavior, detect low performance, and identify probabilities of dropouts from courses.info:eu-repo/semantics/publishedVersio

    Predicting Student Performance on Virtual Learning Environment

    Get PDF
    Virtual learning has gained increased importance because of the recent pandemic situation. A mass shift to virtual means of education delivery has been observed over the past couple of years, forcing the community to develop efficient performance assessment tools. Prediction of students performance using different relevant information has emerged as an efficient tool in educational institutes towards improving the curriculum and teaching methodologies. Automated analysis of educational data using state of the art Machine Learning (ML) and Artificial Intelligence (AI) algorithms is an active area of research. The research presented in this thesis addresses the problem of students performance prediction comprehensively by applying multiple machine learning models (i.e., Multilayer Perceptron (MLP), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), CATBoost, K-Nearest Neighbour (KNN) and Support Vector Classifier (SVC)) on the two benchmark VLE datasets (i.e., Open University Learning Analytics Dataset (OULAD), Coursera). In this context, a series of experiments are performed and important insights are reported. First, the classification performance of machine learning models has been investigated on both OULAD and Coursera datasets. In the second experiment, performance of machine learning models is studied for each course of Coursera dataset and comparative analysis are performed. From the Experiment 1 and Experiment 2, the class imbalance is reported as the highlighted factor responsible for degraded performance of machine learning models. In this context, Experiment 3 is designed to address the class imbalance problem by making use of multiple Synthetic Minority Oversampling Technique (SMOTE) and generative models (i.e., Generative Adversial Networks (GANs)). From the results, SMOTE NN approach was able to achieve best classification performance among the implemented SMOTE techniques. Further, when mixed with generative models, the SMOTENN-GAN generated Coursera dataset was the best on which machine learning models were able to achieve the classification accuracy around 90%. Overall, MLP, XGBoost and CATBoost machine learning models were emerged as the best performing in context to different experiments performed in this thesis

    Using data mining to repurpose German language corpora. An evaluation of data-driven analysis methods for corpus linguistics

    Get PDF
    A growing number of studies report interesting insights gained from existing data resources. Among those, there are analyses on textual data, giving reason to consider such methods for linguistics as well. However, the field of corpus linguistics usually works with purposefully collected, representative language samples that aim to answer only a limited set of research questions. This thesis aims to shed some light on the potentials of data-driven analysis based on machine learning and predictive modelling for corpus linguistic studies, investigating the possibility to repurpose existing German language corpora for linguistic inquiry by using methodologies developed for data science and computational linguistics. The study focuses on predictive modelling and machine-learning-based data mining and gives a detailed overview and evaluation of currently popular strategies and methods for analysing corpora with computational methods. After the thesis introduces strategies and methods that have already been used on language data, discusses how they can assist corpus linguistic analysis and refers to available toolkits and software as well as to state-of-the-art research and further references, the introduced methodological toolset is applied in two differently shaped corpus studies that utilize readily available corpora for German. The first study explores linguistic correlates of holistic text quality ratings on student essays, while the second deals with age-related language features in computer-mediated communication and interprets age prediction models to answer a set of research questions that are based on previous research in the field. While both studies give linguistic insights that integrate into the current understanding of the investigated phenomena in German language, they systematically test the methodological toolset introduced beforehand, allowing a detailed discussion of added values and remaining challenges of machine-learning-based data mining methods in corpus at the end of the thesis
    corecore