Human motion generation aims to generate natural human pose sequences and
shows immense potential for real-world applications. Substantial progress has
been made recently in motion data collection technologies and generation
methods, laying the foundation for increasing interest in human motion
generation. Most research within this field focuses on generating human motions
based on conditional signals, such as text, audio, and scene contexts. While
significant advancements have been made in recent years, the task continues to
pose challenges due to the intricate nature of human motion and its implicit
relationship with conditional signals. In this survey, we present a
comprehensive literature review of human motion generation, which, to the best
of our knowledge, is the first of its kind in this field. We begin by
introducing the background of human motion and generative models, followed by
an examination of representative methods for three mainstream sub-tasks:
text-conditioned, audio-conditioned, and scene-conditioned human motion
generation. Additionally, we provide an overview of common datasets and
evaluation metrics. Lastly, we discuss open problems and outline potential
future research directions. We hope that this survey could provide the
community with a comprehensive glimpse of this rapidly evolving field and
inspire novel ideas that address the outstanding challenges.Comment: 20 pages, 5 figure