13 research outputs found
Benefits of temporal information for appearance-based gaze estimation
State-of-the-art appearance-based gaze estimation methods, usually based on
deep learning techniques, mainly rely on static features. However, temporal
trace of eye gaze contains useful information for estimating a given gaze
point. For example, approaches leveraging sequential eye gaze information when
applied to remote or low-resolution image scenarios with off-the-shelf cameras
are showing promising results. The magnitude of contribution from temporal gaze
trace is yet unclear for higher resolution/frame rate imaging systems, in which
more detailed information about an eye is captured. In this paper, we
investigate whether temporal sequences of eye images, captured using a
high-resolution, high-frame rate head-mounted virtual reality system, can be
leveraged to enhance the accuracy of an end-to-end appearance-based
deep-learning model for gaze estimation. Performance is compared against a
static-only version of the model. Results demonstrate statistically-significant
benefits of temporal information, particularly for the vertical component of
gaze.Comment: In ACM Symposium on Eye Tracking Research & Applications (ETRA), 202
Generative Video Face Reenactment by AUs and Gaze Regularization
In this work, we propose an encoder-decoder-like architecture to perform face reenact- ment in image sequences. Our goal is to transfer the training subject identity to a given test subject, regularazing the generation with Action units and gaze vectors to generate more lifelike results
Towards End-to-end Video-based Eye-Tracking
Estimating eye-gaze from images alone is a challenging task, in large parts
due to un-observable person-specific factors. Achieving high accuracy typically
requires labeled data from test users which may not be attainable in real
applications. We observe that there exists a strong relationship between what
users are looking at and the appearance of the user's eyes. In response to this
understanding, we propose a novel dataset and accompanying method which aims to
explicitly learn these semantic and temporal relationships. Our video dataset
consists of time-synchronized screen recordings, user-facing camera views, and
eye gaze data, which allows for new benchmarks in temporal gaze tracking as
well as label-free refinement of gaze. Importantly, we demonstrate that the
fusion of information from visual stimuli as well as eye images can lead
towards achieving performance similar to literature-reported figures acquired
through supervised personalization. Our final method yields significant
performance improvements on our proposed EVE dataset, with up to a 28 percent
improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular
error), paving the path towards high-accuracy screen-based eye tracking purely
from webcam sensors. The dataset and reference source code are available at
https://ait.ethz.ch/projects/2020/EVEComment: Accepted at ECCV 202
Dialogue Management and Language Generation for a Robust Conversational Virtual Coach: Validation and User Study
Designing human–machine interactive systems requires cooperation between different disciplines is required. In this work, we present a Dialogue Manager and a Language Generator that are the core modules of a Voice-based Spoken Dialogue System (SDS) capable of carrying out challenging, long and complex coaching conversations. We also develop an efficient integration procedure of the whole system that will act as an intelligent and robust Virtual Coach. The coaching task significantly differs from the classical applications of SDSs, resulting in a much higher degree of complexity and difficulty. The Virtual Coach has been successfully tested and validated in a user study with independent elderly, in three different countries with three different languages and cultures: Spain, France and Norway.The research presented in this paper has been conducted as part of the project EMPATHIC that has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant No. 769872. Additionally, this work has been partially funded by projects BEWORD and AMIC-PC of the Minister of Science of Technology, under Grant Nos. PID2021-126061OB-C42 and PDC2021-120846-C43, respectively. Vázquez and López Zorrilla received a PhD scholarship from the Basque Government, with Grant Nos. PRE 2020 1 0274 and PRE 2017 1 0357, respectively