C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion
  Model

Ji, Longbin; Liu, Jinglin; Ren, Yi; Wei, Pengfei; Yin, Xiang; Zhang, Chen

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Authors: Longbin Ji
Jinglin Liu
Yi Ren
Pengfei Wei
Xiang Yin
Chen Zhang
Publication date: 29 August 2023
Publisher

Abstract

Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Additionally, these methods lack effective control over speaker identity and temporal editing of the generated gestures. Focusing on capturing temporal latent information and applying practical controlling, we propose a Controllable Co-speech Gesture Generation framework, named C2G2. Specifically, we propose a two-stage temporal dependency enhancement strategy motivated by latent diffusion models. We further introduce two key features to C2G2, namely a speaker-specific decoder to generate speaker-related real-length skeletons and a repainting strategy for flexible gesture generation/editing. Extensive experiments on benchmark gesture datasets verify the effectiveness of our proposed C2G2 compared with several state-of-the-art baselines. The link of the project demo page can be found at https://c2g2-gesture.github.io/c2_gestureComment: 12 pages, 6 figures, 7 table

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.15016

Last time updated on 10/09/2023