377 research outputs found

    Using emotional and non-emotional measures

    Get PDF
    Elbawab, M., & Henriques, R. (2023). Machine Learning applied to student attentiveness detection: Using emotional and non-emotional measures. Education and Information Technologies, 1-21. https://doi.org/10.1007/s10639-023-11814-5 --- Open access funding provided by FCT|FCCN (b-on). This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project—UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS. Fundação para a Ciência e a Tecnologia,UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS, Roberto Henriques.Electronic learning (e-learning) is considered the new norm of learning. One of the significant drawbacks of e-learning in comparison to the traditional classroom is that teachers cannot monitor the students' attentiveness. Previous literature used physical facial features or emotional states in detecting attentiveness. Other studies proposed combining physical and emotional facial features; however, a mixed model that only used a webcam was not tested. The study objective is to develop a machine learning (ML) model that automatically estimates students' attentiveness during e-learning classes using only a webcam. The model would help in evaluating teaching methods for e-learning. This study collected videos from seven students. The webcam of personal computers is used to obtain a video, from which we build a feature set that characterizes a student's physical and emotional state based on their face. This characterization includes eye aspect ratio (EAR), Yawn aspect ratio (YAR), head pose, and emotional states. A total of eleven variables are used in the training and validation of the model. ML algorithms are used to estimate individual students' attention levels. The ML models tested are decision trees, random forests, support vector machines (SVM), and extreme gradient boosting (XGBoost). Human observers' estimation of attention level is used as a reference. Our best attention classifier is the XGBoost, which achieved an average accuracy of 80.52%, with an AUROC OVR of 92.12%. The results indicate that a combination of emotional and non-emotional measures can generate a classifier with an accuracy comparable to other attentiveness studies. The study would also help assess the e-learning lectures through students' attentiveness. Hence will assist in developing the e-learning lectures by generating an attentiveness report for the tested lecture.publishersversionepub_ahead_of_prin

    Leveraging 2D pose estimators for American Sign Language Recognition

    Get PDF
    Most deaf children born to hearing parents do not have continuous access to language, leading to weaker short-term memory compared to deaf children born to deaf parents. This lack of short-term memory has serious consequences on their mental health and employment rate. To this end, prior work has explored CopyCat, a game where children interact with virtual actors using sign language. While CopyCat has been shown to improve language generation, reception, and repetition, it uses expensive hardware for sign language recognition. This thesis explores the feasibility of using 2D off-the-shelf camera-based pose estimators such as MediaPipe for complementing sign language recognition and moving towards a ubiquitous recognition framework. We compare MediaPipe with 3D pose estimators such as Azure Kinect to determine the feasibility of using off-the-shelf cameras. Furthermore, we develop and compare Hidden Markov Models (HMMs) with state-of-the-art recognition models like Transformers to determine which model is best suited for American Sign Language Recognition in a constrained environment. We find that MediaPipe marginally outperforms Kinect in various experimental settings. Additionally, HMMs outperform Transformers by on average 17.0% on recognition accuracy. Given these results, we believe that a widely deployable game using only a 2D camera is feasible.Undergraduat

    A Multimodal Learning System for Individuals with Sensorial, Neuropsychological, and Relational Impairments

    Get PDF
    This paper presents a system for an interactive multimodal environment able (i) to train the listening comprehension in various populations of pupils, both Italian and immigrants, having different disabilities and (ii) to assess speech production and discrimination. The proposed system is the result of a research project focused on pupils with sensorial, neuropsychological, and relational impairments. The project involves innovative technological systems that the users (speech terabits psychologists and preprimary and primary schools teachers) could adopt for training and assessment of language and speech. Because the system is used in a real scenario (the Italian schools are often affected by poor funding for education and teachers without informatics skills), the guidelines adopted are low-cost technology; usability; customizable system; robustness

    Automated Semantic Content Extraction from Images

    Get PDF
    In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image
    • …
    corecore