173 research outputs found
Unsupervised learning of object landmarks by factorized spatial embeddings
Learning automatically the structure of object categories remains an
important open problem in computer vision. In this paper, we propose a novel
unsupervised approach that can discover and learn landmarks in object
categories, thus characterizing their structure. Our approach is based on
factorizing image deformations, as induced by a viewpoint change or an object
deformation, by learning a deep neural network that detects landmarks
consistently with such visual effects. Furthermore, we show that the learned
landmarks establish meaningful correspondences between different object
instances in a category without having to impose this requirement explicitly.
We assess the method qualitatively on a variety of object types, natural and
man-made. We also show that our unsupervised landmarks are highly predictive of
manually-annotated landmarks in face benchmark datasets, and can be used to
regress these with a high degree of accuracy.Comment: To be published in ICCV 201
Self-supervised learning of a facial attribute embedding from video
We propose a self-supervised framework for learning facial attributes by
simply watching videos of a human face speaking, laughing, and moving over
time. To perform this task, we introduce a network, Facial Attributes-Net
(FAb-Net), that is trained to embed multiple frames from the same video
face-track into a common low-dimensional space. With this approach, we make
three contributions: first, we show that the network can leverage information
from multiple source frames by predicting confidence/attention masks for each
frame; second, we demonstrate that using a curriculum learning regime improves
the learned embedding; finally, we demonstrate that the network learns a
meaningful face embedding that encodes information about head pose, facial
landmarks and facial expression, i.e. facial attributes, without having been
supervised with any labelled data. We are comparable or superior to
state-of-the-art self-supervised methods on these tasks and approach the
performance of supervised methods.Comment: To appear in BMVC 2018. Supplementary material can be found at
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/fabnet.htm
Multi-Image Semantic Matching by Mining Consistent Features
This work proposes a multi-image matching method to estimate semantic
correspondences across multiple images. In contrast to the previous methods
that optimize all pairwise correspondences, the proposed method identifies and
matches only a sparse set of reliable features in the image collection. In this
way, the proposed method is able to prune nonrepeatable features and also
highly scalable to handle thousands of images. We additionally propose a
low-rank constraint to ensure the geometric consistency of feature
correspondences over the whole image collection. Besides the competitive
performance on multi-graph matching and semantic flow benchmarks, we also
demonstrate the applicability of the proposed method for reconstructing
object-class models and discovering object-class landmarks from images without
using any annotation.Comment: CVPR 201
BRUL\`E: Barycenter-Regularized Unsupervised Landmark Extraction
Unsupervised retrieval of image features is vital for many computer vision
tasks where the annotation is missing or scarce. In this work, we propose a new
unsupervised approach to detect the landmarks in images, validating it on the
popular task of human face key-points extraction. The method is based on the
idea of auto-encoding the wanted landmarks in the latent space while discarding
the non-essential information (and effectively preserving the
interpretability). The interpretable latent space representation (the
bottleneck containing nothing but the wanted key-points) is achieved by a new
two-step regularization approach. The first regularization step evaluates
transport distance from a given set of landmarks to some average value (the
barycenter by Wasserstein distance). The second regularization step controls
deviations from the barycenter by applying random geometric deformations
synchronously to the initial image and to the encoded landmarks. We demonstrate
the effectiveness of the approach both in unsupervised and semi-supervised
training scenarios using 300-W, CelebA, and MAFL datasets. The proposed
regularization paradigm is shown to prevent overfitting, and the detection
quality is shown to improve beyond the state-of-the-art face models.Comment: 10 main pages with 6 figures and 1 Table, 14 pages total with 6
supplementary figures. I.B. and N.B. contributed equally. D.V.D. is
corresponding autho
- …