390 research outputs found
A review on data fusion in multimodal learning analytics and educational data mining
The new educational models such as smart learning environments use of digital and context-aware devices to facilitate the learning process. In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused, and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary to apply correctly data fusion approaches and techniques in order to combine various sources of multimodal learning analytics (MLA). These sources or modalities in MLA include audio, video, electrodermal activity data, eye-tracking, user logs, and click-stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech, or writing. This survey introduces data fusion in learning analytics (LA) and educational data mining (EDM) and how these data fusion techniques have been applied in smart learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends, and challenges in this specific research area
Measuring attention using Microsoft Kinect
The transfer of knowledge between individuals has increasingly become achieved with the aid of interfaces or computerized training applications. However, computer based training currently lacks the ability to monitor human behavioral changes and respond to them accordingly. This study examines the ability to predict user attention using features of body posture and head pose. Predictive abilities are assessed by an analysis of the relationship between the measured posture features and common objective measures of attention, such as reaction time and reaction time variance. Subjects were asked to participate in a series of sustained attention tasks while aspects of body movement and positioning were recorded using a Microsoft Kinect. Results showed support for identifiable patterns of behavior associated with attention while also suggesting the complex inter-relationship of measured features and susceptibility of these features to environmental conditions
Using emotional and non-emotional measures
Elbawab, M., & Henriques, R. (2023). Machine Learning applied to student attentiveness detection: Using emotional and non-emotional measures. Education and Information Technologies, 1-21. https://doi.org/10.1007/s10639-023-11814-5 --- Open access funding provided by FCT|FCCN (b-on). This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project—UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS. Fundação para a Ciência e a Tecnologia,UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS, Roberto Henriques.Electronic learning (e-learning) is considered the new norm of learning. One of the significant drawbacks of e-learning in comparison to the traditional classroom is that teachers cannot monitor the students' attentiveness. Previous literature used physical facial features or emotional states in detecting attentiveness. Other studies proposed combining physical and emotional facial features; however, a mixed model that only used a webcam was not tested. The study objective is to develop a machine learning (ML) model that automatically estimates students' attentiveness during e-learning classes using only a webcam. The model would help in evaluating teaching methods for e-learning. This study collected videos from seven students. The webcam of personal computers is used to obtain a video, from which we build a feature set that characterizes a student's physical and emotional state based on their face. This characterization includes eye aspect ratio (EAR), Yawn aspect ratio (YAR), head pose, and emotional states. A total of eleven variables are used in the training and validation of the model. ML algorithms are used to estimate individual students' attention levels. The ML models tested are decision trees, random forests, support vector machines (SVM), and extreme gradient boosting (XGBoost). Human observers' estimation of attention level is used as a reference. Our best attention classifier is the XGBoost, which achieved an average accuracy of 80.52%, with an AUROC OVR of 92.12%. The results indicate that a combination of emotional and non-emotional measures can generate a classifier with an accuracy comparable to other attentiveness studies. The study would also help assess the e-learning lectures through students' attentiveness. Hence will assist in developing the e-learning lectures by generating an attentiveness report for the tested lecture.publishersversionepub_ahead_of_prin
Recommended from our members
Modeling engagement with multimodal multisensor data: the continuous performance test as an objective tool to track flow
Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leaveone-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential
Facial Expressions of Sentence Comprehension
International audienceUnderstanding facial expressions allows access to one's intentional and affective states. Using the findings in psychology and neuroscience, in which physical behaviors of the face are linked to emotional states, this paper aims to study sentence comprehension shown by facial expressions. In our experiments, participants took part in a roughly 30-minute computer mediated task, where they were asked to answer either "true" or "false" to knowledge-based questions, then immediately given feedback of "correct" or "incorrect". Their faces, which were recorded during the task using the Kinect v2 device, are later used to identify the level of comprehension shown by their expressions. To achieve this, the SVM and Random Forest classifiers with facial appearance information extracted using a spatiotemporal local descriptor, named LPQ-TOP, are employed. Results of online sentence comprehension show that facial dynamics are promising to help understand cognitive states of the mind
MATT: Multimodal Attention Level Estimation for e-learning Platforms
This work presents a new multimodal system for remote attention level
estimation based on multimodal face analysis. Our multimodal approach uses
different parameters and signals obtained from the behavior and physiological
processes that have been related to modeling cognitive load such as faces
gestures (e.g., blink rate, facial actions units) and user actions (e.g., head
pose, distance to the camera). The multimodal system uses the following modules
based on Convolutional Neural Networks (CNNs): Eye blink detection, head pose
estimation, facial landmark detection, and facial expression features. First,
we individually evaluate the proposed modules in the task of estimating the
student's attention level captured during online e-learning sessions. For that
we trained binary classifiers (high or low attention) based on Support Vector
Machines (SVM) for each module. Secondly, we find out to what extent multimodal
score level fusion improves the attention level estimation. The mEBAL database
is used in the experimental framework, a public multi-modal database for
attention level estimation obtained in an e-learning environment that contains
data from 38 users while conducting several e-learning tasks of variable
difficulty (creating changes in student cognitive loads).Comment: Preprint of the paper presented to the Workshop on Artificial
Intelligence for Education (AI4EDU) of AAAI 202
Unobtrusive Assessment Of Student Engagement Levels In Online Classroom Environment Using Emotion Analysis
Measuring student engagement has emerged as a significant factor in the process of learning and a good indicator of the knowledge retention capacity of the student. As synchronous online classes have become more prevalent in recent years, gauging a student\u27s attention level is more critical in validating the progress of every student in an online classroom environment. This paper details the study on profiling the student attentiveness to different gradients of engagement level using multiple machine learning models. Results from the high accuracy model and the confidence score obtained from the cloud-based computer vision platform - Amazon Rekognition were then used to statistically validate any correlation between student attentiveness and emotions. This statistical analysis helps to identify the significant emotions that are essential in gauging various engagement levels. This study identified emotions like calm, happy, surprise, and fear are critical in gauging the student\u27s attention level. These findings help in the earlier detection of students with lower attention levels, consequently helping the instructors focus their support and guidance on the students in need, leading to a better online learning environment
The Multimodal Tutor: Adaptive Feedback from Multimodal Experiences
This doctoral thesis describes the journey of ideation, prototyping and empirical testing of the Multimodal Tutor, a system designed for providing digital feedback that supports psychomotor skills acquisition using learning and multimodal data capturing. The feedback is given in real-time with machine-driven assessment of the learner's task execution. The predictions are tailored by supervised machine learning models trained with human annotated samples. The main contributions of this thesis are: a literature survey on multimodal data for learning, a conceptual model (the Multimodal Learning Analytics Model), a technological framework (the Multimodal Pipeline), a data annotation tool (the Visual Inspection Tool) and a case study in Cardiopulmonary Resuscitation training (CPR Tutor). The CPR Tutor generates real-time, adaptive feedback using kinematic and myographic data and neural networks
Recognizing Multidimensional Engagement of E-learners Based on Multi-channel Data in E-learning Environment
Despite recent advances in MOOC, the current e-learning systems have
advantages of alleviating barriers by time differences, and geographically
spatial separation between teachers and students. However, there has been a
'lack of supervision' problem that e-learner's learning unit state(LUS) can't
be supervised automatically. In this paper, we present a fusion framework
considering three channel data sources: 1) videos/images from a camera, 2) eye
movement information tracked by a low solution eye tracker and 3) mouse
movement. Based on these data modalities, we propose a novel approach of
multi-channel data fusion to explore the learning unit state recognition. We
also propose a method to build a learning state recognition model to avoid
manually labeling image data. The experiments were carried on our designed
online learning prototype system, and we choose CART, Random Forest and GBDT
regression model to predict e-learner's learning state. The results show that
multi-channel data fusion model have a better recognition performance in
comparison with single channel model. In addition, a best recognition
performance can be reached when image, eye movement and mouse movement features
are fused.Comment: 4 pages, 4 figures, 2 table
Modelling collaborative problem-solving competence with transparent learning analytics: is video data enough?
In this study, we describe the results of our research to model collaborative problem-solving (CPS) competence based on analytics generated from video data. We have collected ~500 mins video data from 15 groups of 3 students working to solve design problems collaboratively. Initially, with the help of OpenPose, we automatically generated frequency metrics such as the number of the face-in-the-screen; and distance metrics such as the distance between bodies. Based on these metrics, we built decision trees to predict students' listening, watching, making, and speaking behaviours as well as predicting the students' CPS competence. Our results provide useful decision rules mined from analytics of video data which can be used to inform teacher dashboards. Although, the accuracy and recall values of the models built are inferior to previous machine learning work that utilizes multimodal data, the transparent nature of the decision trees provides opportunities for explainable analytics for teachers and learners. This can lead to more agency of teachers and learners, therefore can lead to easier adoption. We conclude the paper with a discussion on the value and limitations of our approach
- …