Facial expression generation is one of the most challenging and long-sought
aspects of character animation, with many interesting applications. The
challenging task, traditionally having relied heavily on digital craftspersons,
remains yet to be explored. In this paper, we introduce a generative framework
for generating 3D facial expression sequences (i.e. 4D faces) that can be
conditioned on different inputs to animate an arbitrary 3D face mesh. It is
composed of two tasks: (1) Learning the generative model that is trained over a
set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input
facial mesh driven by the generated landmark sequences. The generative model is
based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved
remarkable success in generative tasks of other domains. While it can be
trained unconditionally, its reverse process can still be conditioned by
various condition signals. This allows us to efficiently develop several
downstream tasks involving various conditional generation, by using expression
labels, text, partial sequences, or simply a facial geometry. To obtain the
full mesh deformation, we then develop a landmark-guided encoder-decoder to
apply the geometrical deformation embedded in landmarks on a given facial mesh.
Experiments show that our model has learned to generate realistic, quality
expressions solely from the dataset of relatively small size, improving over
the state-of-the-art methods. Videos and qualitative comparisons with other
methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models
will be made available upon acceptance