Synthesizing high-quality 3D face models from natural language descriptions
is very valuable for many applications, including avatar creation, virtual
reality, and telepresence. However, little research ever tapped into this task.
We argue the major obstacle lies in 1) the lack of high-quality 3D face data
with descriptive text annotation, and 2) the complex mapping relationship
between descriptive language space and shape/appearance space. To solve these
problems, we build Describe3D dataset, the first large-scale dataset with
fine-grained text descriptions for text-to-3D face generation task. Then we
propose a two-stage framework to first generate a 3D face that matches the
concrete descriptions, then optimize the parameters in the 3D shape and texture
space with abstract description to refine the 3D face model. Extensive
experimental results show that our method can produce a faithful 3D face that
conforms to the input descriptions with higher accuracy and quality than
previous methods. The code and Describe3D dataset are released at
https://github.com/zhuhao-nju/describe3d .Comment: Accepted to CVPR 202