This paper presents the first significant work on directly predicting 3D face
landmarks on neural radiance fields (NeRFs), without using intermediate
representations such as 2D images, depth maps, or point clouds. Our 3D
coarse-to-fine Face Landmarks NeRF (FLNeRF) model efficiently samples from the
NeRF on the whole face with individual facial features for accurate landmarks.
To mitigate the limited number of facial expressions in the available data,
local and non-linear NeRF warp is applied at facial features in fine scale to
simulate large emotions range, including exaggerated facial expressions (e.g.,
cheek blowing, wide opening mouth, eye blinking), for training FLNeRF. With
such expression augmentation, our model can predict 3D landmarks not limited to
the 20 discrete expressions given in the data. Robust 3D NeRF facial landmarks
contribute to many downstream tasks. As an example, we modify MoFaNeRF to
enable high-quality face editing and swapping using face landmarks on NeRF,
allowing more direct control and wider range of complex expressions.
Experiments show that the improved model using landmarks achieves comparable to
better results.Comment: Hao Zhang and Tianyuan Dai contributed equally. Project website:
https://github.com/ZHANG1023/FLNeR