Developing gaze estimation models that generalize well to unseen domains and
in-the-wild conditions remains a challenge with no known best solution. This is
mostly due to the difficulty of acquiring ground truth data that cover the
distribution of possible faces, head poses and environmental conditions that
exist in the real world. In this work, we propose to train general gaze
estimation models based on 3D geometry-aware gaze pseudo-annotations which we
extract from arbitrary unlabelled face images, which are abundantly available
in the internet. Additionally, we leverage the observation that head, body and
hand pose estimation benefit from revising them as dense 3D coordinate
prediction, and similarly express gaze estimation as regression of dense 3D eye
meshes. We overcome the absence of compatible ground truth by fitting rigid 3D
eyeballs on existing gaze datasets and design a multi-view supervision
framework to balance the effect of pseudo-labels during training. We test our
method in the task of gaze generalization, in which we demonstrate improvement
of up to 30% compared to state-of-the-art when no ground truth data are
available, and up to 10% when they are. The project material will become
available for research purposes.Comment: 13 pages, 12 figure