1,035 research outputs found

    Engagement Detection with Multi-Task Training in E-Learning Environments

    Get PDF
    Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-of-the-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction

    Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines

    Full text link
    The degree of concentration, enthusiasm, optimism, and passion displayed by individual(s) while interacting with a machine is referred to as `user engagement'. Engagement comprises of behavioral, cognitive, and affect related cues. To create engagement prediction systems that can work in real-world conditions, it is quintessential to learn from rich, diverse datasets. To this end, a large scale multi-faceted engagement in the wild dataset EngageNet is proposed. 31 hours duration data of 127 participants representing different illumination conditions are recorded. Thorough experiments are performed exploring the applicability of different features, action units, eye gaze, head pose, and MARLIN. Data from user interactions (question-answer) are analyzed to understand the relationship between effective learning and user engagement. To further validate the rich nature of the dataset, evaluation is also performed on the EngageWild dataset. The experiments show the usefulness of the proposed dataset. The code, models, and dataset link are publicly available at https://github.com/engagenet/engagenet_baselines

    Facial Expression Recognition from World Wild Web

    Full text link
    Recognizing facial expression in a wild setting has remained a challenging task in computer vision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. Three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? The results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%

    Multimodal sentiment analysis in real-life videos

    Get PDF
    This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

    Unobtrusive Assessment Of Student Engagement Levels In Online Classroom Environment Using Emotion Analysis

    Get PDF
    Measuring student engagement has emerged as a significant factor in the process of learning and a good indicator of the knowledge retention capacity of the student. As synchronous online classes have become more prevalent in recent years, gauging a student\u27s attention level is more critical in validating the progress of every student in an online classroom environment. This paper details the study on profiling the student attentiveness to different gradients of engagement level using multiple machine learning models. Results from the high accuracy model and the confidence score obtained from the cloud-based computer vision platform - Amazon Rekognition were then used to statistically validate any correlation between student attentiveness and emotions. This statistical analysis helps to identify the significant emotions that are essential in gauging various engagement levels. This study identified emotions like calm, happy, surprise, and fear are critical in gauging the student\u27s attention level. These findings help in the earlier detection of students with lower attention levels, consequently helping the instructors focus their support and guidance on the students in need, leading to a better online learning environment

    Graph-based Facial Affect Analysis: A Review of Methods, Applications and Challenges

    Full text link
    Facial affect analysis (FAA) using visual signals is important in human-computer interaction. Early methods focus on extracting appearance and geometry features associated with human affects, while ignoring the latent semantic information among individual facial changes, leading to limited performance and generalization. Recent work attempts to establish a graph-based representation to model these semantic relationships and develop frameworks to leverage them for various FAA tasks. In this paper, we provide a comprehensive review of graph-based FAA, including the evolution of algorithms and their applications. First, the FAA background knowledge is introduced, especially on the role of the graph. We then discuss approaches that are widely used for graph-based affective representation in literature and show a trend towards graph construction. For the relational reasoning in graph-based FAA, existing studies are categorized according to their usage of traditional methods or deep models, with a special emphasis on the latest graph neural networks. Performance comparisons of the state-of-the-art graph-based FAA methods are also summarized. Finally, we discuss the challenges and potential directions. As far as we know, this is the first survey of graph-based FAA methods. Our findings can serve as a reference for future research in this field.Comment: 20 pages, 12 figures, 5 table

    FACE READERS: The Frontier of Computer Vision and Math Learning

    Get PDF
    The future of AI-assisted individualized learning includes computer vision to inform intelligent tutors and teachers about student affect, motivation and performance. Facial expression recognition is essential in recognizing subtle differences when students ask for hints or fail to solve problems. Facial features and classification labels enable intelligent tutors to predict students’ performance and recommend activities. Videos can capture students’ faces and model their effort and progress; machine learning classifiers can support intelligent tutors to provide interventions. One goal of this research is to support deep dives by teachers to identify students’ individual needs through facial expression and to provide immediate feedback. Another goal is to develop data-directed education to gauge students’ pre-existing knowledge and analyze real-time data that will engage both teachers and students in more individualized and precision teaching and learning. This paper identifies three phases in the process of recognizing and predicting student progress based on analyzing facial features: Phase I: Collecting datasets and identifying salient labels for facial features and student attention/engagement; Phase II: Building and training deep learning models of facial features; and Phase III: Predicting student problem-solving outcome. © 2023 Copyright for this paper by its authors
    • …