18 research outputs found
Self-supervised learning of a facial attribute embedding from video
We propose a self-supervised framework for learning facial attributes by
simply watching videos of a human face speaking, laughing, and moving over
time. To perform this task, we introduce a network, Facial Attributes-Net
(FAb-Net), that is trained to embed multiple frames from the same video
face-track into a common low-dimensional space. With this approach, we make
three contributions: first, we show that the network can leverage information
from multiple source frames by predicting confidence/attention masks for each
frame; second, we demonstrate that using a curriculum learning regime improves
the learned embedding; finally, we demonstrate that the network learns a
meaningful face embedding that encodes information about head pose, facial
landmarks and facial expression, i.e. facial attributes, without having been
supervised with any labelled data. We are comparable or superior to
state-of-the-art self-supervised methods on these tasks and approach the
performance of supervised methods.Comment: To appear in BMVC 2018. Supplementary material can be found at
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/fabnet.htm
Unsupervised learning of object landmarks by factorized spatial embeddings
Learning automatically the structure of object categories remains an
important open problem in computer vision. In this paper, we propose a novel
unsupervised approach that can discover and learn landmarks in object
categories, thus characterizing their structure. Our approach is based on
factorizing image deformations, as induced by a viewpoint change or an object
deformation, by learning a deep neural network that detects landmarks
consistently with such visual effects. Furthermore, we show that the learned
landmarks establish meaningful correspondences between different object
instances in a category without having to impose this requirement explicitly.
We assess the method qualitatively on a variety of object types, natural and
man-made. We also show that our unsupervised landmarks are highly predictive of
manually-annotated landmarks in face benchmark datasets, and can be used to
regress these with a high degree of accuracy.Comment: To be published in ICCV 201