Attention level estimation systems have a high potential in many use cases,
such as human-robot interaction, driver modeling and smart home systems, since
being able to measure a person's attention level opens the possibility to
natural interaction between humans and computers. The topic of estimating a
human's visual focus of attention has been actively addressed recently in the
field of HCI. However, most of these previous works do not consider attention
as a subjective, cognitive attentive state. New research within the field also
faces the problem of the lack of annotated datasets regarding attention level
in a certain context. The novelty of our work is two-fold: First, we introduce
a new annotation framework that tackles the subjective nature of attention
level and use it to annotate more than 100,000 images with three attention
levels and second, we introduce a novel method to estimate attention levels,
relying purely on extracted geometric features from RGB and depth images, and
evaluate it with a deep learning fusion framework. The system achieves an
overall accuracy of 80.02%. Our framework and attention level annotations are
made publicly available.Comment: 14th International Conference on Computer Vision Theory and
Application