47 research outputs found

    Predicting students' academic achievement using methods of educational data mining

    Full text link
    The tremendous growth in educational data forms the need to have meaningful information produced from it. Educational Data Mining (EDM) has become an exciting research area that can reveal valuable knowledge from educational databases. This knowledge can be used for many purposes, including identifying dropouts or weak students who need special attention and discovering extraordinary students who can be presented with lifetime opportunities. This thesis allows the reader to grasp the field of EDM from all its angles, with more details on academic prediction tasks. It provides a comprehensive background for understanding EDM and discusses the different methods and applications of data mining in education. It also provides a rich literature review on predicting students’ academic achievement and covers related works from 2007 to 2022. Furthermore, it examines the application of machine learning algorithms to predict students’ academic achievement on two diverse datasets. The first dataset has been obtained from the Computer and Information Science College at Princess Norah University (PNU) in Riyadh, Saudi Arabia. In this work, 300 undergraduate students’ records have been used to predict their final academic achievement. We used the Weka software to compare the performance of eight data mining algorithms in predicting students’ academic achievement. Those algorithms are C4.5, Simple CART, LADTree, Support Vector Machine, Naïve Bayes, K-nearest-Neighbor, Artificial Neural Networks, and Random Forest and validated the models using 10-folds cross-validation. The empirical results show that: (i) In the College of Computer and Information Science, the following features are the most essential to predict student academic achievement: the student GPA in each semester, the number of failed courses during the first four semesters, and the grades of three core courses. On the other hand, student's proficiency in English and the number of registered credit hours do not play a major role in their success (ii) Naïve Base performs the best in predicting students’ achievement followed by Random Forest; (iii) Students who attend an orientation year do not have a greater chance of success at that college. The second dataset represents the records of the Business Informatics master's students at the University of Mannheim in Germany. In this work, more than 700 undergraduate students’ data have been used to predict their final academic achievement using different machine learning libraries in python. We compared the performance of nine data mining algorithms in predicting students’ academic achievement. Those algorithms are Logistic Regression, Naïve Bayes, K-nearest neighbor, Artificial Neural Networks, Support Vector Machine, Random Forest, Gradient Boosting, Light Gradient Boosting, and Extreme Gradient Boosting and validated the models using 10-folds cross-validation. The empirical results show the following: (i) Bagging and Boosting algorithms produce a better predictive performance as compared to individual classifiers, and (ii) the semesters’ grades are the most significant features for the predictive model, followed by students’ culture and distance from students’ accommodation to university campus. The outcomes of the two studies can be used to design a recommender system that enables timely interventions for the undergraduate students of the College of Information and Computer Science and the postgraduate students of the Business Informatics program

    A Review of Emotion Recognition Methods from Keystroke, Mouse, and Touchscreen Dynamics

    Get PDF
    Emotion can be defined as a subject’s organismic response to an external or internal stimulus event. The responses could be reflected in pattern changes of the subject’s facial expression, gesture, gait, eye-movement, physiological signals, speech and voice, keystroke, and mouse dynamics, etc. This suggests that on the one hand emotions can be measured/recognized from the responses, and on the other hand they can be facilitated/regulated by external stimulus events, situation changes or internal motivation changes. It is well-known that emotion has a close relationship with both physical and mental health, usually affecting an individual’s and a team’s work performance, thus emotion recognition is an important prerequisite for emotion regulation towards better emotional states and work performance. The primary problem in emotion recognition is how to recognize a subject’s emotional states easily and accurately. Currently, there are a body of good research on emotion recognition from facial expression, gesture, gait, eye-tracking, and other physiological signals such as speech and voice, but they are all intrusive and obtrusive to some extent. In contrast, keystroke, mouse and touchscreen (KMT) dynamics data can be collected non-intrusively and unobtrusively as secondary data responding to primary physical actions, thus, this paper aims to review the state-of-the-art research on emotion recognition from KMT dynamics and to identify key research challenges, opportunities and a future research roadmap for referencing. In addition, this paper answers the following six research questions (RQs): (1) what are the commonly used emotion elicitation methods and databases for emotion recognition? (2) which emotions could be recognized from KMT dynamics? (3) what key features are most appropriate for recognizing different specific emotions? (4) which classification methods are most effective for specific emotions? (5) what are the application trends of emotion recognition from KMT dynamics? (6) which application contexts are of greatest concern

    The relationship between online tutorials and academic performance in distance education: a predictive framework for Open University, Indonesia

    Full text link
    This study was administered to find new patterns and meaningful innovation that focuses on applying different machine learning approaches to predict students’ performance by analysing and identifying features in E-learning which strongly affects students’ performance. Moreover, it is a new phenomenon of prediction model that can be implemented in many fields

    Predicting Paid Certification in Massive Open Online Courses

    Get PDF
    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    An Automatic Modern Standard Arabic Text Simplification System: A Corpus-Based Approach

    Get PDF
    This thesis brings together an overview of Text Readability (TR) about Text Simplification (TS) with an application of both to Modern Standard Arabic (MSA). It will present our findings on using automatic TR and TS tools to teach MSA, along with challenges, limitations, and recommendations about enhancing the TR and TS models. Reading is one of the most vital tasks that provide language input for communication and comprehension skills. It is proved that the use of long sentences, connected sentences, embedded phrases, passive voices, non- standard word orders, and infrequent words can increase the text difficulty for people with low literacy levels, as well as second language learners. The thesis compares the use of sentence embeddings of different types (fastText, mBERT, XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. The accuracy of the 3-way CEFR (The Common European Framework of Reference for Languages Proficiency Levels) classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification, respectively and 0.71 Spearman correlation for the regression task. At the same time, the binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for the sentence-pair semantic similarity classifier. TS is an NLP task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). The simplification study experimented using two approaches: (i) a classification approach and (ii) a generative approach. It then evaluated the effectiveness of these methods using the BERTScore (Zhang et al., 2020) evaluation metric. The simple sentences produced by the mT5 model achieved P 0.72, R 0.68 and F-1 0.70 via BERTScore while combining Arabic- BERT and fastText achieved P 0.97, R 0.97 and F-1 0.97. To reiterate, this research demonstrated the effectiveness of the implementation of a corpus-based method combined with extracting extensive linguistic features via the latest NLP techniques. It provided insights which can be of use in various Arabic corpus studies and NLP tasks such as translation for educational purposes

    Trialing project-based learning in a new EAP ESP course: A collaborative reflective practice of three college English teachers

    Get PDF
    Currently in many Chinese universities, the traditional College English course is facing the risk of being ‘marginalized’, replaced or even removed, and many hours previously allocated to the course are now being taken by EAP or ESP. At X University in northern China, a curriculum reform as such is taking place, as a result of which a new course has been created called ‘xue ke’ English. Despite the fact that ‘xue ke’ means subject literally, the course designer has made it clear that subject content is not the target, nor is the course the same as EAP or ESP. This curriculum initiative, while possibly having been justified with a rationale of some kind (e.g. to meet with changing social and/or academic needs of students and/or institutions), this is posing a great challenge for, as well as considerable pressure on, a number of College English teachers who have taught this single course for almost their entire teaching career. In such a context, three teachers formed a peer support group in Semester One this year, to work collaboratively co-tackling the challenge, and they chose Project-Based Learning (PBL) for the new course. This presentation will report on the implementation of this project, including the overall designing, operational procedure, and the teachers’ reflections. Based on discussion, pre-agreement was reached on the purpose and manner of collaboration as offering peer support for more effective teaching and learning and fulfilling and pleasant professional development. A WeChat group was set up as the chief platform for messaging, idea-sharing, and resource-exchanging. Physical meetings were supplementary, with sound agenda but flexible time, and venues. Mosoteach cloud class (lan mo yun ban ke) was established as a tool for virtual learning, employed both in and after class. Discussions were held at the beginning of the semester which determined only brief outlines for PBL implementation and allowed space for everyone to autonomously explore in their own way. Constant further discussions followed, which generated a great deal of opportunities for peer learning and lesson plan modifications. A reflective journal, in a greater or lesser detailed manner, was also kept by each teacher to record the journey of the collaboration. At the end of the semester, it was commonly recognized that, although challenges existed, the collaboration was overall a success and they were all willing to continue with it and endeavor to refine it to be a more professional and productive approach

    Modelling students' behaviour and affect in ILE through educational data mining

    Get PDF

    Big data-driven multimodal traffic management : trends and challenges

    Get PDF

    Computerised diagnosis of malaria

    Get PDF
    corecore