1,003 research outputs found
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation
Gaze estimation is a fundamental task in many applications of computer
vision, human computer interaction and robotics. Many state-of-the-art methods
are trained and tested on custom datasets, making comparison across methods
challenging. Furthermore, existing gaze estimation datasets have limited head
pose and gaze variations, and the evaluations are conducted using different
protocols and metrics. In this paper, we propose a new gaze estimation dataset
called ETH-XGaze, consisting of over one million high-resolution images of
varying gaze under extreme head poses. We collect this dataset from 110
participants with a custom hardware setup including 18 digital SLR cameras and
adjustable illumination conditions, and a calibrated system to record ground
truth gaze targets. We show that our dataset can significantly improve the
robustness of gaze estimation methods across different head poses and gaze
angles. Additionally, we define a standardized experimental protocol and
evaluation metric on ETH-XGaze, to better unify gaze estimation research going
forward. The dataset and benchmark website are available at
https://ait.ethz.ch/projects/2020/ETH-XGazeComment: Accepted at ECCV 2020 (Spotlight
CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection
The aim of gaze redirection is to manipulate the gaze in an image to the
desired direction. However, existing methods are inadequate in generating
perceptually reasonable images. Advancement in generative adversarial networks
has shown excellent results in generating photo-realistic images. Though, they
still lack the ability to provide finer control over different image
attributes. To enable such fine-tuned control, one needs to obtain ground truth
annotations for the training data which can be very expensive. In this paper,
we propose an unsupervised domain adaptation framework, called CUDA-GR, that
learns to disentangle gaze representations from the labeled source domain and
transfers them to an unlabeled target domain. Our method enables fine-grained
control over gaze directions while preserving the appearance information of the
person. We show that the generated image-labels pairs in the target domain are
effective in knowledge transfer and can boost the performance of the downstream
tasks. Extensive experiments on the benchmarking datasets show that the
proposed method can outperform state-of-the-art techniques in both quantitative
and qualitative evaluation
Semi-Synthetic Dataset Augmentation for Application-Specific Gaze Estimation
Although the number of gaze estimation datasets is growing, the application
of appearance-based gaze estimation methods is mostly limited to estimating the
point of gaze on a screen. This is in part because most datasets are generated
in a similar fashion, where the gaze target is on a screen close to camera's
origin. In other applications such as assistive robotics or marketing research,
the 3D point of gaze might not be close to the camera's origin, meaning models
trained on current datasets do not generalize well to these tasks. We therefore
suggest generating a textured tridimensional mesh of the face and rendering the
training images from a virtual camera at a specific position and orientation
related to the application as a mean of augmenting the existing datasets. In
our tests, this lead to an average 47% decrease in gaze estimation angular
error.Comment: 5 pages, 5 figure
Towards End-to-end Video-based Eye-Tracking
Estimating eye-gaze from images alone is a challenging task, in large parts
due to un-observable person-specific factors. Achieving high accuracy typically
requires labeled data from test users which may not be attainable in real
applications. We observe that there exists a strong relationship between what
users are looking at and the appearance of the user's eyes. In response to this
understanding, we propose a novel dataset and accompanying method which aims to
explicitly learn these semantic and temporal relationships. Our video dataset
consists of time-synchronized screen recordings, user-facing camera views, and
eye gaze data, which allows for new benchmarks in temporal gaze tracking as
well as label-free refinement of gaze. Importantly, we demonstrate that the
fusion of information from visual stimuli as well as eye images can lead
towards achieving performance similar to literature-reported figures acquired
through supervised personalization. Our final method yields significant
performance improvements on our proposed EVE dataset, with up to a 28 percent
improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular
error), paving the path towards high-accuracy screen-based eye tracking purely
from webcam sensors. The dataset and reference source code are available at
https://ait.ethz.ch/projects/2020/EVEComment: Accepted at ECCV 202
Angle Range and Identity Similarity Enhanced Gaze and Head Redirection based on Synthetic data
In this paper, we propose a method for improving the angular accuracy and
photo-reality of gaze and head redirection in full-face images. The problem
with current models is that they cannot handle redirection at large angles, and
this limitation mainly comes from the lack of training data. To resolve this
problem, we create data augmentation by monocular 3D face reconstruction to
extend the head pose and gaze range of the real data, which allows the model to
handle a wider redirection range. In addition to the main focus on data
augmentation, we also propose a framework with better image quality and
identity preservation of unseen subjects even training with synthetic data.
Experiments show that our method significantly improves redirection performance
in terms of redirection angular accuracy while maintaining high image quality,
especially when redirecting to large angles
- …