This work presents a new multimodal system for remote attention level
estimation based on multimodal face analysis. Our multimodal approach uses
different parameters and signals obtained from the behavior and physiological
processes that have been related to modeling cognitive load such as faces
gestures (e.g., blink rate, facial actions units) and user actions (e.g., head
pose, distance to the camera). The multimodal system uses the following modules
based on Convolutional Neural Networks (CNNs): Eye blink detection, head pose
estimation, facial landmark detection, and facial expression features. First,
we individually evaluate the proposed modules in the task of estimating the
student's attention level captured during online e-learning sessions. For that
we trained binary classifiers (high or low attention) based on Support Vector
Machines (SVM) for each module. Secondly, we find out to what extent multimodal
score level fusion improves the attention level estimation. The mEBAL database
is used in the experimental framework, a public multi-modal database for
attention level estimation obtained in an e-learning environment that contains
data from 38 users while conducting several e-learning tasks of variable
difficulty (creating changes in student cognitive loads).Comment: Preprint of the paper presented to the Workshop on Artificial
Intelligence for Education (AI4EDU) of AAAI 202