We present a method for training a regression network from image pixels to 3D
morphable model coordinates using only unlabeled photographs. The training loss
is based on features from a facial recognition network, computed on-the-fly by
rendering the predicted faces with a differentiable renderer. To make training
from features feasible and avoid network fooling effects, we introduce three
objectives: a batch distribution loss that encourages the output distribution
to match the distribution of the morphable model, a loopback loss that ensures
the network can correctly reinterpret its own output, and a multi-view identity
loss that compares the features of the predicted 3D face and the input
photograph from multiple viewing angles. We train a regression network using
these objectives, a set of unlabeled photographs, and the morphable model
itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material
(http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html