8 research outputs found
Learning to Personalize in Appearance-Based Gaze Tracking
Personal variations severely limit the performance of appearance-based gaze
tracking. Adapting to these variations using standard neural network model
adaptation methods is difficult. The problems range from overfitting, due to
small amounts of training data, to underfitting, due to restrictive model
architectures. We tackle these problems by introducing the SPatial Adaptive
GaZe Estimator (SPAZE). By modeling personal variations as a low-dimensional
latent parameter space, SPAZE provides just enough adaptability to capture the
range of personal variations without being prone to overfitting. Calibrating
SPAZE for a new person reduces to solving a small optimization problem. SPAZE
achieves an error of 2.70 degrees with 9 calibration samples on MPIIGaze,
improving on the state-of-the-art by 14 %. We contribute to gaze tracking
research by empirically showing that personal variations are well-modeled as a
3-dimensional latent parameter space for each eye. We show that this
low-dimensionality is expected by examining model-based approaches to gaze
tracking. We also show that accurate head pose-free gaze tracking is possible
Gaze estimation problem tackled through synthetic images
In this paper, we evaluate a synthetic framework to be used in the field of
gaze estimation employing deep learning techniques. The lack of sufficient
annotated data could be overcome by the utilization of a synthetic evaluation
framework as far as it resembles the behavior of a real scenario. In this work,
we use U2Eyes synthetic environment employing I2Head datataset as real
benchmark for comparison based on alternative training and testing strategies.
The results obtained show comparable average behavior between both frameworks
although significantly more robust and stable performance is retrieved by the
synthetic images. Additionally, the potential of synthetically pretrained
models in order to be applied in user's specific calibration strategies is
shown with outstanding performances.Comment: https://dl.acm.org/doi/abs/10.1145/3379156.339136
Low-cost eye tracking calibration: a knowledge-based study
Subject calibration has been demonstrated to improve the accuracy in high-performance eye trackers. However, the true weight of calibration in off-the-shelf eye tracking solutions is still not addressed. In this work, a theoretical framework to measure the effects of calibration in deep learning-based gaze estimation is proposed for low-resolution systems. To this end, features extracted from the synthetic U2Eyes dataset are used in a fully connected network in order to isolate the effect of specific user’s features, such as kappa angles. Then, the impact of system calibration in a real setup employing I2Head dataset images is studied. The obtained results show accuracy improvements over 50%, probing that calibration is a key process also in low-resolution gaze estimation scenarios. Furthermore, we show that after calibration accuracy values close to those obtained by high-resolution systems, in the range of 0.7°, could be theoretically obtained if a careful selection of image features was performed, demonstrating significant room for improvement for off-the-shelf eye tracking system
Towards End-to-end Video-based Eye-Tracking
Estimating eye-gaze from images alone is a challenging task, in large parts
due to un-observable person-specific factors. Achieving high accuracy typically
requires labeled data from test users which may not be attainable in real
applications. We observe that there exists a strong relationship between what
users are looking at and the appearance of the user's eyes. In response to this
understanding, we propose a novel dataset and accompanying method which aims to
explicitly learn these semantic and temporal relationships. Our video dataset
consists of time-synchronized screen recordings, user-facing camera views, and
eye gaze data, which allows for new benchmarks in temporal gaze tracking as
well as label-free refinement of gaze. Importantly, we demonstrate that the
fusion of information from visual stimuli as well as eye images can lead
towards achieving performance similar to literature-reported figures acquired
through supervised personalization. Our final method yields significant
performance improvements on our proposed EVE dataset, with up to a 28 percent
improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular
error), paving the path towards high-accuracy screen-based eye tracking purely
from webcam sensors. The dataset and reference source code are available at
https://ait.ethz.ch/projects/2020/EVEComment: Accepted at ECCV 202
An integrated framework for multi-state driver monitoring using heterogeneous loss and attention-based feature decoupling
Multi-state driver monitoring is a key technique in building human-centric intelligent driving systems. This paper presents an integrated visual-based multi-state driver monitoring framework that incorporates head rotation, gaze, blinking, and yawning. To solve the challenge of head pose and gaze estimation, this paper proposes a unified network architecture that tackles these estimations as soft classification tasks. A feature decoupling module was developed to decouple the extracted features from different axis domains. Furthermore, a cascade cross-entropy was designed to restrict large deviations during the training phase, which was combined with the other features to form a heterogeneous loss function. In addition, gaze consistency was used to optimize its estimation, which also informed the model architecture design of the gaze estimation task. Finally, the proposed method was verified on several widely used benchmark datasets. Comprehensive experiments were conducted to evaluate the proposed method and the experimental results showed that the proposed method could achieve a state-of-the-art performance compared to other methods