16 research outputs found
Subjective Annotations for Vision-Based Attention Level Estimation
Attention level estimation systems have a high potential in many use cases,
such as human-robot interaction, driver modeling and smart home systems, since
being able to measure a person's attention level opens the possibility to
natural interaction between humans and computers. The topic of estimating a
human's visual focus of attention has been actively addressed recently in the
field of HCI. However, most of these previous works do not consider attention
as a subjective, cognitive attentive state. New research within the field also
faces the problem of the lack of annotated datasets regarding attention level
in a certain context. The novelty of our work is two-fold: First, we introduce
a new annotation framework that tackles the subjective nature of attention
level and use it to annotate more than 100,000 images with three attention
levels and second, we introduce a novel method to estimate attention levels,
relying purely on extracted geometric features from RGB and depth images, and
evaluate it with a deep learning fusion framework. The system achieves an
overall accuracy of 80.02%. Our framework and attention level annotations are
made publicly available.Comment: 14th International Conference on Computer Vision Theory and
Application
MATT: Multimodal Attention Level Estimation for e-learning Platforms
This work presents a new multimodal system for remote attention level
estimation based on multimodal face analysis. Our multimodal approach uses
different parameters and signals obtained from the behavior and physiological
processes that have been related to modeling cognitive load such as faces
gestures (e.g., blink rate, facial actions units) and user actions (e.g., head
pose, distance to the camera). The multimodal system uses the following modules
based on Convolutional Neural Networks (CNNs): Eye blink detection, head pose
estimation, facial landmark detection, and facial expression features. First,
we individually evaluate the proposed modules in the task of estimating the
student's attention level captured during online e-learning sessions. For that
we trained binary classifiers (high or low attention) based on Support Vector
Machines (SVM) for each module. Secondly, we find out to what extent multimodal
score level fusion improves the attention level estimation. The mEBAL database
is used in the experimental framework, a public multi-modal database for
attention level estimation obtained in an e-learning environment that contains
data from 38 users while conducting several e-learning tasks of variable
difficulty (creating changes in student cognitive loads).Comment: Preprint of the paper presented to the Workshop on Artificial
Intelligence for Education (AI4EDU) of AAAI 202
mEBAL: A Multimodal Database for Eye Blink Detection and Attention Level Estimation
This work presents mEBAL, a multimodal database for eye blink detection and
attention level estimation. The eye blink frequency is related to the cognitive
activity and automatic detectors of eye blinks have been proposed for many
tasks including attention level estimation, analysis of neuro-degenerative
diseases, deception recognition, drive fatigue detection, or face
anti-spoofing. However, most existing databases and algorithms in this area are
limited to experiments involving only a few hundred samples and individual
sensors like face cameras. The proposed mEBAL improves previous databases in
terms of acquisition sensors and samples. In particular, three different
sensors are simultaneously considered: Near Infrared (NIR) and RGB cameras to
capture the face gestures and an Electroencephalography (EEG) band to capture
the cognitive activity of the user and blinking events. Regarding the size of
mEBAL, it comprises 6,000 samples and the corresponding attention level from 38
different students while conducting a number of e-learning tasks of varying
difficulty. In addition to presenting mEBAL, we also include preliminary
experiments on: i) eye blink detection using Convolutional Neural Networks
(CNN) with the facial images, and ii) attention level estimation of the
students based on their eye blink frequency
mEBAL2 Database and Benchmark: Image-based Multispectral Eyeblink Detection
This work introduces a new multispectral database and novel approaches for
eyeblink detection in RGB and Near-Infrared (NIR) individual images. Our
contributed dataset (mEBAL2, multimodal Eye Blink and Attention Level
estimation, Version 2) is the largest existing eyeblink database, representing
a great opportunity to improve data-driven multispectral approaches for blink
detection and related applications (e.g., attention level estimation and
presentation attack detection in face biometrics). mEBAL2 includes 21,100 image
sequences from 180 different students (more than 2 million labeled images in
total) while conducting a number of e-learning tasks of varying difficulty or
taking a real course on HTML initiation through the edX MOOC platform. mEBAL2
uses multiple sensors, including two Near-Infrared (NIR) and one RGB camera to
capture facial gestures during the execution of the tasks, as well as an
Electroencephalogram (EEG) band to get the cognitive activity of the user and
blinking events. Furthermore, this work proposes a Convolutional Neural Network
architecture as benchmark for blink detection on mEBAL2 with performances up to
97%. Different training methodologies are implemented using the RGB spectrum,
NIR spectrum, and the combination of both to enhance the performance on
existing eyeblink detectors. We demonstrate that combining NIR and RGB images
during training improves the performance of RGB eyeblink detectors (i.e.,
detection based only on a RGB image). Finally, the generalization capacity of
the proposed eyeblink detectors is validated in wilder and more challenging
environments like the HUST-LEBW dataset to show the usefulness of mEBAL2 to
train a new generation of data-driven approaches for eyeblink detection.Comment: This paper is under consideration at Pattern Recognition Letter
M2LADS: A System for Generating MultiModal Learning Analytics Dashboards
In this article, we present a Web-based System called M2LADS, which supports the integration and visualization of multimodal data recorded in learning sessions in a MOOC in the form of Web-based Dashboards. Based on the edBB platform, the multimodal data gathered contains biometric and behavioral signals including electroencephalogram data to measure learners' cognitive attention, heart rate for affective measures, visual attention from the video recordings. Additionally, learners' static background data and their learning performance measures are tracked using LOGCE and MOOC tracking logs respectively, and both are included in the Web-based System. M2LADS provides opportunities to capture learners' holistic experience during their interactions with the MOOC, which can in turn be used to improve their learning outcomes through feedback visualizations and interventions, as well as to enhance learning analytics models and improve the open content of the MOOC
Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
Real-time eyeblink detection in the wild can widely serve for fatigue
detection, face anti-spoofing, emotion analysis, etc. The existing research
efforts generally focus on single-person cases towards trimmed video. However,
multi-person scenario within untrimmed videos is also important for practical
applications, which has not been well concerned yet. To address this, we shed
light on this research field for the first time with essential contributions on
dataset, theory, and practices. In particular, a large-scale dataset termed
MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is
proposed under multi-person conditions. The samples are captured from
unconstrained films to reveal "in the wild" characteristics. Meanwhile, a
real-time multi-person eyeblink detection method is also proposed. Being
different from the existing counterparts, our proposition runs in a one-stage
spatio-temporal way with end-to-end learning capacity. Specifically, it
simultaneously addresses the sub-tasks of face detection, face tracking, and
human instance-level eyeblink detection. This paradigm holds 2 main advantages:
(1) eyeblink features can be facilitated via the face's global context (e.g.,
head pose and illumination condition) with joint optimization and interaction,
and (2) addressing these sub-tasks in parallel instead of sequential manner can
save time remarkably to meet the real-time running requirement. Experiments on
MPEblink verify the essential challenges of real-time multi-person eyeblink
detection in the wild for untrimmed video. Our method also outperforms existing
approaches by large margins and with a high inference speed.Comment: Accepted by CVPR 202
Can adas distract driver’s attention? An rgb-d camera and deep learning-based analysis
Driver inattention is the primary cause of vehicle accidents; hence, manufacturers have introduced systems to support the driver and improve safety; nonetheless, advanced driver assistance systems (ADAS) must be properly designed not to become a potential source of distraction for the driver due to the provided feedback. In the present study, an experiment involving auditory and haptic ADAS has been conducted involving 11 participants, whose attention has been monitored during their driving experience. An RGB-D camera has been used to acquire the drivers’ face data. Subsequently, these images have been analyzed using a deep learning-based approach, i.e., a convolutional neural network (CNN) specifically trained to perform facial expression recognition (FER). Analyses to assess possible relationships between these results and both ADAS activations and event occurrences, i.e., accidents, have been carried out. A correlation between attention and accidents emerged, whilst facial expressions and ADAS activations resulted to be not correlated, thus no evidence that the designed ADAS are a possible source of distraction has been found. In addition to the experimental results, the proposed approach has proved to be an effective tool to monitor the driver through the usage of non-invasive techniques
Face Image Quality Assessment: A Literature Survey
The performance of face analysis and recognition systems depends on the
quality of the acquired face data, which is influenced by numerous factors.
Automatically assessing the quality of face data in terms of biometric utility
can thus be useful to detect low-quality data and make decisions accordingly.
This survey provides an overview of the face image quality assessment
literature, which predominantly focuses on visible wavelength face image input.
A trend towards deep learning based methods is observed, including notable
conceptual differences among the recent approaches, such as the integration of
quality assessment into face recognition models. Besides image selection, face
image quality assessment can also be used in a variety of other application
scenarios, which are discussed herein. Open issues and challenges are pointed
out, i.a. highlighting the importance of comparability for algorithm
evaluations, and the challenge for future work to create deep learning
approaches that are interpretable in addition to providing accurate utility
predictions
Measuring Brain Activation Patterns from Raw Single-Channel EEG during Exergaming: A Pilot Study
Physical and cognitive rehabilitation is deemed crucial to attenuate symptoms and to improve the quality of life in people with neurodegenerative disorders, such as Parkinson's Disease. Among rehabilitation strategies, a novel and popular approach relies on exergaming: the patient performs a motor or cognitive task within an interactive videogame in a virtual environment. These strategies may widely benefit from being tailored to the patient's needs and engagement patterns. In this pilot study, we investigated the ability of a low-cost BCI based on single-channel EEG to measure the user's engagement during an exergame. As a first step, healthy subjects were recruited to assess the system's capability to distinguish between (1) rest and gaming conditions and (2) gaming at different complexity levels, through Machine Learning supervised models. Both EEG and eye-blink features were employed. The results indicate the ability of the exergame to stimulate engagement and the capability of the supervised classification models to distinguish resting stage from game-play(accuracy > 95%). Finally, different clusters of subject responses throughout the game were identified, which could help define models of engagement trends. This result is a starting point in developing an effectively subject-tailored exergaming system