TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting

Choudhury, Rohan; Jeni, Laszlo A.; Kitani, Kris

TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting

Authors: Rohan Choudhury
Laszlo A. Jeni
Kris Kitani
Publication date: 14 September 2023
Publisher

Abstract

Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the state-of-the-art by recurrently computing per-person 2D pose features, fusing both spatial and temporal information into a single representation. In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency. We further use this representation to track human poses over time as well as predict future poses. Finally, we demonstrate that our model is able to generalize across datasets without scene-specific fine-tuning. TEMPO achieves 10

\%

better MPJPE with a 33

\times

improvement in FPS compared to TesseTrack on the challenging CMU Panoptic Studio dataset.Comment: Accepted at ICCV 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2309.07910

Last time updated on 08/10/2023