In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute
human pose estimation with calibrated camera. Accurate and generalizable
absolute 3D human pose estimation from monocular 2D pose input is an ill-posed
problem. To address this challenge, we convert the input from pixel space to 3D
normalized rays. This conversion makes our approach robust to camera intrinsic
parameter changes. To deal with the in-the-wild camera extrinsic parameter
variations, Ray3D explicitly takes the camera extrinsic parameters as an input
and jointly models the distribution between the 3D pose rays and camera
extrinsic parameters. This novel network design is the key to the outstanding
generalizability of Ray3D approach. To have a comprehensive understanding of
how the camera intrinsic and extrinsic parameter variations affect the accuracy
of absolute 3D key-point localization, we conduct in-depth systematic
experiments on three single person 3D benchmarks as well as one synthetic
benchmark. These experiments demonstrate that our method significantly
outperforms existing state-of-the-art models. Our code and the synthetic
dataset are available at https://github.com/YxZhxn/Ray3D .Comment: Accepted by CVPR 202