We address the problem of photorealistic 3D face avatar synthesis from sparse
images. Existing Parametric models for face avatar reconstruction struggle to
generate details that originate from inputs. Meanwhile, although current
NeRF-based avatar methods provide promising results for novel view synthesis,
they fail to generalize well for unseen expressions. We improve from NeRF and
propose a novel framework that, by leveraging the parametric 3DMM models, can
reconstruct a high-fidelity drivable face avatar and successfully handle the
unseen expressions. At the core of our implementation are structured
displacement feature and semantic-aware learning module. Our structured
displacement feature will introduce the motion prior as an additional
constraints and help perform better for unseen expressions, by constructing
displacement volume. Besides, the semantic-aware learning incorporates
multi-level prior, e.g., semantic embedding, learnable latent code, to lift the
performance to a higher level. Thorough experiments have been doen both
quantitatively and qualitatively to demonstrate the design of our framework,
and our method achieves much better results than the current state-of-the-arts