Multi-person pose forecasting remains a challenging problem, especially in
modeling fine-grained human body interaction in complex crowd scenarios.
Existing methods typically represent the whole pose sequence as a temporal
series, yet overlook interactive influences among people based on skeletal body
parts. In this paper, we propose a novel Trajectory-Aware Body Interaction
Transformer (TBIFormer) for multi-person pose forecasting via effectively
modeling body part interactions. Specifically, we construct a Temporal Body
Partition Module that transforms all the pose sequences into a Multi-Person
Body-Part sequence to retain spatial and temporal information based on body
semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA)
module, utilizing the transformed sequence to learn body part dynamics for
inter- and intra-individual interactions. Furthermore, different from prior
Euclidean distance-based spatial encodings, we present a novel and efficient
Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative
spatial information and additional interactive clues. On both short- and
long-term horizons, we empirically evaluate our framework on CMU-Mocap,
MuPoTS-3D as well as synthesized datasets (6 ~ 10 persons), and demonstrate
that our method greatly outperforms the state-of-the-art methods. Code will be
made publicly available upon acceptance.Comment: Accepted by CVPR2023, 8 pages, 6 figures. arXiv admin note: text
overlap with arXiv:2208.0922