Monocular 3D human pose estimation is quite challenging due to the inherent
ambiguity and occlusion, which often lead to high uncertainty and
indeterminacy. On the other hand, diffusion models have recently emerged as an
effective tool for generating high-quality images from noise. Inspired by their
capability, we explore a novel pose estimation framework (DiffPose) that
formulates 3D pose estimation as a reverse diffusion process. We incorporate
novel designs into our DiffPose to facilitate the diffusion process for 3D pose
estimation: a pose-specific initialization of pose uncertainty distributions, a
Gaussian Mixture Model-based forward diffusion process, and a
context-conditioned reverse diffusion process. Our proposed DiffPose
significantly outperforms existing methods on the widely used pose estimation
benchmarks Human3.6M and MPI-INF-3DHP. Project page:
https://gongjia0208.github.io/Diffpose/.Comment: Accepted to CVPR 202