Reconstructing personalized animatable head avatars has significant
implications in the fields of AR/VR. Existing methods for achieving explicit
face control of 3D Morphable Models (3DMM) typically rely on multi-view images
or videos of a single subject, making the reconstruction process complex.
Additionally, the traditional rendering pipeline is time-consuming, limiting
real-time animation possibilities. In this paper, we introduce CVTHead, a novel
approach that generates controllable neural head avatars from a single
reference image using point-based neural rendering. CVTHead considers the
sparse vertices of mesh as the point set and employs the proposed
Vertex-feature Transformer to learn local feature descriptors for each vertex.
This enables the modeling of long-range dependencies among all the vertices.
Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves
comparable performance to state-of-the-art graphics-based methods. Moreover, it
enables efficient rendering of novel human heads with various expressions, head
poses, and camera views. These attributes can be explicitly controlled using
the coefficients of 3DMMs, facilitating versatile and realistic animation in
real-time scenarios.Comment: WACV202