Monocular 3D face reconstruction is a wide-spread topic, and existing
approaches tackle the problem either through fast neural network inference or
offline iterative reconstruction of face geometry. In either case
carefully-designed energy functions are minimized, commonly including loss
terms like a photometric loss, a landmark reprojection loss, and others. In
this work we propose a new loss function for monocular face capture, inspired
by how humans would perceive the quality of a 3D face reconstruction given a
particular image. It is widely known that shading provides a strong indicator
for 3D shape in the human visual system. As such, our new 'perceptual' shape
loss aims to judge the quality of a 3D face estimate using only shading cues.
Our loss is implemented as a discriminator-style neural network that takes an
input face image and a shaded render of the geometry estimate, and then
predicts a score that perceptually evaluates how well the shaded render matches
the given image. This 'critic' network operates on the RGB image and geometry
render alone, without requiring an estimate of the albedo or illumination in
the scene. Furthermore, our loss operates entirely in image space and is thus
agnostic to mesh topology. We show how our new perceptual shape loss can be
combined with traditional energy terms for monocular 3D face optimization and
deep neural network regression, improving upon current state-of-the-art
results.Comment: Accepted to PG 2023. Project page:
https://studios.disneyresearch.com/2023/10/09/a-perceptual-shape-loss-for-monocular-3d-face-reconstruction/
Video: https://www.youtube.com/watch?v=RYdyoIZEuU