120 research outputs found

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    Reflections on different learning analytics indicators for supporting study success

    Get PDF
    Common factors, which are related to study success include students’ sociodemographic factors, cognitive capacity, or prior academic performance, and individual attributes as well as course related factors such as active learning and attention or environmental factors related to supportive academic and social embeddedness. In addition, there are various stages of a learner’s learning journey from the beginning when commencing learning until its completion, as well as different indicators or variables that can be examined to gauge or predict how successfully that journey can or will be at different points during that journey, or how successful learners may complete the study and thereby acquiring the intended learning outcomes. The aim of this research is to gain a deeper understanding of not only if learning analytics can support study success, but which aspects of a learner’s learning journey can benefit from the utilisation of learning analytics. We, therefore, examined different learning analytics indicators to show which aspect of the learning journey they were successfully supporting. Key indicators may include GPA, learning history, and clickstream data. Depending on the type of higher education institution, and the mode of education (face-to-face and/or distance), the chosen indicators may be different due to them having different importance in predicting the learning outcomes and study success

    A Survey on Data-Driven Evaluation of Competencies and Capabilities Across Multimedia Environments

    Get PDF
    The rapid evolution of technology directly impacts the skills and jobs needed in the next decade. Users can, intentionally or unintentionally, develop different skills by creating, interacting with, and consuming the content from online environments and portals where informal learning can emerge. These environments generate large amounts of data; therefore, big data can have a significant impact on education. Moreover, the educational landscape has been shifting from a focus on contents to a focus on competencies and capabilities that will prepare our society for an unknown future during the 21st century. Therefore, the main goal of this literature survey is to examine diverse technology-mediated environments that can generate rich data sets through the users’ interaction and where data can be used to explicitly or implicitly perform a data-driven evaluation of different competencies and capabilities. We thoroughly and comprehensively surveyed the state of the art to identify and analyse digital environments, the data they are producing and the capabilities they can measure and/or develop. Our survey revealed four key multimedia environments that include sites for content sharing & consumption, video games, online learning and social networks that fulfilled our goal. Moreover, different methods were used to measure a large array of diverse capabilities such as expertise, language proficiency and soft skills. Our results prove the potential of the data from diverse digital environments to support the development of lifelong and lifewide 21st-century capabilities for the future society

    The Application of Data Mining Techniques to Learning Analytics and Its Implications for Interventions with Small Class Sizes

    Get PDF
    There has been significant progress in the development of techniques to deliver effective technology enhanced learning systems in education, with substantial progress in the field of learning analytics. These analyses are able to support academics in the identification of students at risk of failure or withdrawal. The early identification of students at risk is critical to giving academic staff and institutions the opportunity to make timely interventions. This thesis considers established machine learning techniques, as well as a novel method, for the prediction of student outcomes and the support of interventions, including the presentation of a variety of predictive analyses and of a live experiment. It reviews the status of technology enhanced learning systems and the associated institutional obstacles to their implementation and deployment. Many courses are comprised of relatively small student cohorts, with institutional privacy protocols limiting the data readily available for analysis. It appears that very little research attention has been devoted to this area of analysis and prediction. I present an experiment conducted on a final year university module, with a student cohort of 23, where the data available for prediction is limited to lecture/tutorial attendance, virtual learning environment accesses and intermediate assessments. I apply and compare a variety of machine learning analyses to assess and predict student performance, applied at appropriate points during module delivery. Despite some mixed results, I found potential for predicting student performance in small student cohorts with very limited student attributes, with accuracies comparing favourably with published results using large cohorts and significantly more attributes. I propose that the analyses will be useful to support module leaders in identifying opportunities to make timely academic interventions. Student data may include a combination of nominal and numeric data. A large variety of techniques are available to analyse numeric data, however there are fewer techniques applicable to nominal data. I summarise the results of what I believe to be a novel technique to analyse nominal data by making a systematic comparison of data pairs. In this thesis I have surveyed existing intelligent learning/training systems and explored the contemporary AI techniques which appear to offer the most promising contributions to the prediction of student attainment. I have researched and catalogued the organisational and non-technological challenges to be addressed for successful system development and implementation and proposed a set of critical success criteria to apply. This dissertation is supported by published work

    The Big Five:Addressing Recurrent Multimodal Learning Data Challenges

    Get PDF
    The analysis of multimodal data in learning is a growing field of research, which has led to the development of different analytics solutions. However, there is no standardised approach to handle multimodal data. In this paper, we describe and outline a solution for five recurrent challenges in the analysis of multimodal data: the data collection, storing, annotation, processing and exploitation. For each of these challenges, we envision possible solutions. The prototypes for some of the proposed solutions will be discussed during the Multimodal Challenge of the fourth Learning Analytics & Knowledge Hackathon, a two-day hands-on workshop in which the authors will open up the prototypes for trials, validation and feedback

    Multimodal Challenge: Analytics Beyond User-computer Interaction Data

    Get PDF
    This contribution describes one the challenges explored in the Fourth LAK Hackathon. This challenge aims at shifting the focus from learning situations which can be easily traced through user-computer interactions data and concentrate more on user-world interactions events, typical of co-located and practice-based learning experiences. This mission, pursued by the multimodal learning analytics (MMLA) community, seeks to bridge the gap between digital and physical learning spaces. The “multimodal” approach consists in combining learners’ motoric actions with physiological responses and data about the learning contexts. These data can be collected through multiple wearable sensors and Internet of Things (IoT) devices. This Hackathon table will confront with three main challenges arising from the analysis and valorisation of multimodal datasets: 1) the data collection and storing, 2) the data annotation, 3) the data processing and exploitation. Some research questions which will be considered in this Hackathon challenge are the following: how to process the raw sensor data streams and extract relevant features? which data mining and machine learning techniques can be applied? how can we compare two action recordings? How to combine sensor data with Experience API (xAPI)? what are meaningful visualisations for these data

    Measuring academic performance of students in Higher Education using data mining techniques

    Get PDF
    Educational Data Mining (EDM) is a developing discipline, concerned with expanding the classical Data Mining (DM) methods and developing new methods for discovering the data that originate from educational systems. It aims to use those methods to achieve a logical understanding of students, and the educational environment they should have for better learning. These data are characterized by their large size and randomness and this can make it difficult for educators to extract knowledge from these data. Additionally, knowledge extracted from data by means of counting the occurrence of certain events is not always reliable, since the counting process sometimes does not take into consideration other factors and parameters that could affect the extracted knowledge. Student attendance in Higher Education has always been dealt with in a classical way, i.e. educators rely on counting the occurrence of attendance or absence building their knowledge about students as well as modules based on this count. This method is neither credible nor does it necessarily provide a real indication of a student s performance. On other hand, the choice of an effective student assessment method is an issue of interest in Higher Education. Various studies (Romero, et al., 2010) have shown that students tend to get higher marks when assessed through coursework-based assessment methods - which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of Educational Data Mining (EDM) studies that pre-processed data through the conventional Data Mining processes including the data preparation process, but they are using transcript data as it stands without looking at examination and coursework results weighting which could affect prediction accuracy. This thesis explores the above problems and tries to formulate the extracted knowledge in a way that guarantees achieving accurate and credible results. Student attendance data, gathered from the educational system, were first cleaned in order to remove any randomness and noise, then various attributes were studied so as to highlight the most significant ones that affect the real attendance of students. The next step was to derive an equation that measures the Student Attendance s Credibility (SAC) considering the attributes chosen in the previous step. The reliability of the newly developed measure was then evaluated in order to examine its consistency. In term of transcripts data, this thesis proposes a different data preparation process through investigating more than 230,000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The data have been processed through different stages in order to extract a categorical factor through which students module marks are refined during the data preparation process. The results of this work show that students final marks should not be isolated from the nature of the enrolled module s assessment methods; rather they must be investigated thoroughly and considered during EDM s data pre-processing phases. More generally, it is concluded that Educational Data should not be prepared in the same way as exist data due to the differences such as sources of data, applications, and types of errors in them. Therefore, an attribute, Coursework Assessment Ratio (CAR), is proposed to use in order to take the different modules assessment methods into account while preparing student transcript data. The effect of CAR and SAC on prediction process using data mining classification techniques such as Random Forest, Artificial Neural Networks and k-Nears Neighbors have been investigated. The results were generated by applying the DM techniques on our data set and evaluated by measuring the statistical differences between Classification Accuracy (CA) and Root Mean Square Error (RMSE) of all models. Comprehensive evaluation has been carried out for all results in the experiments to compare all DM techniques results, and it has been found that Random forest (RF) has the highest CA and lowest RMSE. The importance of SAC and CAR in increasing the prediction accuracy has been proved in Chapter 5. Finally, the results have been compared with previous studies that predicted students final marks, based on students marks at earlier stages of their study. The comparisons have taken into consideration similar data and attributes, whilst first excluding average CAR and SAC and secondly by including them, and then measuring the prediction accuracy between both. The aim of this comparison is to ensure that the new preparation process stage will positively affect the final results

    Improving serious games evaluation by applying learning analytics and data mining techniques

    Get PDF
    Tesis inĂ©dita de la Universidad Complutense de Madrid, Facultad de InformĂĄtica, Departamento de IngenierĂ­a del Software e Inteligencia Artificial, leĂ­da el 15/06/2017. Tesis formato europeo (compendio de artĂ­culos)Serious games are highly motivational resources effective to teach, raise awareness, or change the perceptions of players. To foster their application in education, teachers and institutions require clear and formal evidences to assess students' learning while they are playing the games. However, traditional assessment techniques rely on external questionnaires, typically carried out before and after playing, that fail to measure players' learning while it is happening. The multiple interactions carried out by players in the games can provide more precise information about how players play, and even be used to assess them. In this regard, game learning analytics techiques propose the collection and analysis of such interactions for multiple purposes, including assessment. The potentially large game learning analytics data collected can be further analyzed with data mining techniques to discover unexpected patterns and to provide measures to evaluate the effect of fames on their players and assess their learning...Los juegos serios son recursos altamente motivadores y efectivos para enseñar, concienciar, o cambiar las percepciones de sus jugadores. Para fomentar su aplicaciĂłn en educaciĂłn, los profesores y las instituciones necesitan pruebas claras y automĂĄticas con las que evaluar el aprendizaje de sus estudiantes mientras utilizan los juegos. Tradicionalmente, la evaluaciĂłn con juegos serios se basa en cuestionarios externos, realizados normalmente antes y despuĂ©s de jugar, que no miden el aprendizaje de los jugadores durante el proceso en sĂ­. Las mĂșltiples interacciones que realizan los jugadores al jugar pueden proporcionar una informaciĂłn mĂĄs precisa sobre cĂłmo juegan los jugadores e, incluso, utilizarse para evaluar su aprendizaje. En este sentido, las analĂ­ticas de aprendizaje para juegos proponen tĂ©cnicas para la recogida y el anĂĄlisis de dichas interacciones con mĂșltiples fines, incluida la evaluaciĂłn de los jugadores. Los datos (potencialmente numerosos) de las analĂ­ticas de aprendizaje para juegos pueden analizarse en mayor detalle con tĂ©cnicas d minerĂ­a de datos que permiten descubrir patrones ocultos a simple vista y proporcionar mejores medidas para estudiar el efecto de los juegos en los estudiantes y evaluar su aprendizaje...Fac. de InformĂĄticaTRUEunpu
    • 

    corecore