146 research outputs found

    A Multidimensional Deep Learner Model of Urgent Instructor Intervention Need in MOOC Forum Posts

    Get PDF
    In recent years, massive open online courses (MOOCs) have become one of the most exciting innovations in e-learning environments. Thousands of learners around the world enroll on these online platforms to satisfy their learning needs (mostly) free of charge. However, despite the advantages MOOCs offer learners, dropout rates are high. Struggling learners often describe their feelings of confusion and need for help via forum posts. However, the often-huge numbers of posts on forums make it unlikely that instructors can respond to all learners and many of these urgent posts are overlooked or discarded. To overcome this, mining raw data for learners’ posts may provide a helpful way of classifying posts where learners require urgent intervention from instructors, to help learners and reduce the current high dropout rates. In this paper we propose, a method based on correlations of different dimensions of learners’ posts to determine the need for urgent intervention. Our initial statistical analysis found some interesting significant correlations between posts expressing sentiment, confusion, opinion, questions, and answers and the need for urgent intervention. Thus, we have developed a multidimensional deep learner model combining these features with natural language processing (NLP). To illustrate our method, we used a benchmark dataset of 29598 posts, from three different academic subject areas. The findings highlight that the combined, multi-dimensional features model is more effective than the text-only (NLP) analysis, showing that future models need to be optimised based on all these dimensions, when classifying urgent posts

    Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums

    Get PDF
    In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a highly unbalanced (sparse) dataset. Here, we aim to automate the urgent comments identification process, based on fine-grained learner modelling—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the first gold standard dataset for Urgent iNstructor InTErvention (UNITE), which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only compare, for the first time for the unbalanced problem, several data balancing techniques, comprising text augmentation, text augmentation with undersampling, and undersampling, but also propose several new pipelines for combining different augmenters for text augmentation. Results show that models with undersampling can predict most urgent cases; and 3X augmentation + undersampling usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments)

    Intervention Prediction in MOOCs Based on Learners’ Comments: A Temporal Multi-input Approach Using Deep Learning and Transformer Models

    Get PDF
    High learner dropout rates in MOOC-based education contexts have encouraged researchers to explore and propose different intervention models. In discussion forums, intervention is critical, not only to identify comments that require replies but also to consider learners who may require intervention in the form of staff support. There is a lack of research on the role of intervention based on learner comments to prevent learner dropout in MOOC-based settings. To fill this research gap, we propose an intervention model that detects when staff intervention is required to prevent learner dropout using a dataset from FutureLearn. Our proposed model was based on learners’ comments history by integrating the most-recent sequence of comments written by learners to identify if an intervention was necessary to prevent dropout. We aimed to find both the proper classifier and the number of comments representing the appropriate most recent sequence of comments. We developed several intervention models by utilising two forms of supervised multi-input machine learning (ML) classification models (deep learning and transformer). For the transformer model, specifically, we propose the siamese and dual temporal multi-input, which we term the multi-siamese BERT and multiple BERT. We further experimented with clustering learners based on their respective number of comments to analyse if grouping as a pre-processing step improved the results. The results show that, whilst multi-input for deep learning can be useful, a better overall effect is achieved by using the transformer model, which has better performance in detecting learners who require intervention. Contrary to our expectations, however, clustering before prediction can have negative consequences on prediction outcomes, especially in the underrepresented class

    Urgency Analysis of Learners’ Comments: An Automated Intervention Priority Model for MOOC

    Get PDF
    Recently, the growing number of learners in Massive Open Online Course (MOOC) environments generate a vast amount of online comments via social interactions, general discussions, expressing feelings or asking for help. Concomitantly, learner dropout, at any time during MOOC courses, is very high, whilst the number of learners completing (completers) is low. Urgent intervention and attention may alleviate this problem. Analysing and mining learner comments is a fundamental step towards understanding their need for intervention from instructors. Here, we explore a dataset from a FutureLearn MOOC course. We find that (1) learners who write many comments that need urgent intervention tend to write many comments, in general. (2) The motivation to access more steps (i.e., learning resources) is higher in learners without many comments needing intervention, than that of learners needing intervention. (3) Learners who have many comments that need intervention are less likely to complete the course (13%). Therefore, we propose a new priority model for the urgency of intervention built on learner histories – past urgency, sentiment analysis and step access

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    Mineração de Texto em Moocs: Análise da Relevância Temática de Postagens em Fóruns de Discussão

    Get PDF
    Os MOOCs estão em gradativa evolução, o que ocorre devido à grande disseminação dos ambientes virtuais de aprendizagem, os quais disponibilizam meios de interação aos participantes, um deles é o fórum de discussão, que possui muitas informações a respeito do engajamento dos alunos. Contudo, realizar a leitura de todas as postagens é uma tarefa difícil, pois os MOOCs costumam ter uma faixa muito alta de alunos matriculados. Neste sentido, a mineração de texto pode auxiliar professores a obter conhecimentos relevantes sobre as postagens dos alunos. Desta forma, neste estudo foi realizada uma mineração de textos utilizando grafos, das postagens dos alunos em fóruns de discussão de dois MOOCs da plataforma Lúmina, um no qual o professor ofertante interage com os alunos e outro no qual o professor não interage. Com o intuito de verificar a relevância de tais postagens e observar o comprometimento dos alunos com o curso. Para tanto, foi utilizada a ferramenta Sobek para geração dos grafos e foi calculado o coeficiente de relevância temática (CRT) para cada fórum de discussão. Os resultados apontam que mediações nos fóruns de discussão, por meio de interações de tutores e/ou professores causam impacto significativo na relevância das postagens dos alunos

    Text mining in moocs : analysis of thematic relevance in discussion forums

    Get PDF
    Os MOOCs estão em gradativa evolução, o que ocorre devido à grande disseminação dos ambientes virtuais de aprendizagem, os quais disponibilizam meios de interação aos participantes, um deles é o fórum de discussão, que possui muitas informações a respeito do engajamento dos alunos. Contudo, realizar a leitura de todas as postagens é uma tarefa difícil, pois os MOOCs costumam ter uma faixa muito alta de alunos matriculados. Neste sentido, a mineração de texto pode auxiliar professores a obter conhecimentos relevantes sobre as postagens dos alunos. Desta forma, neste estudo foi realizada uma mineração de textos utilizando grafos, das postagens dos alunos em fóruns de discussão de dois MOOCs da plataforma Lúmina, um no qual o professor ofertante interage com os alunos e outro no qual o professor não interage. Com o intuito de verificar a relevância de tais postagens e observar o comprometimento dos alunos com o curso. Para tanto, foi utilizada a ferramenta Sobek para geração dos grafos e foi calculado o coeficiente de relevância temática (CRT) para cada fórum de discussão. Os resultados apontam que mediações nos fóruns de discussão, por meio de interações de tutores e/ou professores causam impacto significativo na relevância das postagens dos alunos.MOOCs are gradually evolving, which is due to the widespread dissemination of virtual learning environments, which provide participants with a means of interaction, one of which is the discussion forum, which has a lot of information about student engagement. However, reading all posts is a difficult task, as MOOCs often have a very high range of students enrolled. In this sense, text mining can help teachers gain relevant knowledge about student posts. Thus, in this study a text mining was performed using graphs, from the students' posts in discussion forums of two Lúmina platform MOOCs, one in which the offering teacher interacts with the students and another in which the teacher does not interact. In order to verify the relevance of such posts and observe the students' commitment to the course. For this, the Sobek graph generation tool was used and the thematic relevance coefficient (TRC) was calculated for each discussion forum. The results indicate that mediations in discussion forums, through interactions of tutors and teachers, have a significant impact on the relevance of students' posts

    Keywords at Work: Investigating Keyword Extraction in Social Media Applications

    Full text link
    This dissertation examines a long-standing problem in Natural Language Processing (NLP) -- keyword extraction -- from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset -- student statements of purpose. Furthermore, the system is used with new features on the problem of usage expression extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns. While each of the above problems is conceptually distinct, they share two key common elements -- keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145929/1/lahiri_1.pd

    Immersive Telepresence: A framework for training and rehearsal in a postdigital age

    Get PDF
    corecore