While the voxel-based methods have achieved promising results for
multi-person 3D pose estimation from multi-cameras, they suffer from heavy
computation burdens, especially for large scenes. We present Faster VoxelPose
to address the challenge by re-projecting the feature volume to the three
two-dimensional coordinate planes and estimating X, Y, Z coordinates from them
separately. To that end, we first localize each person by a 3D bounding box by
estimating a 2D box and its height based on the volume features projected to
the xy-plane and z-axis, respectively. Then for each person, we estimate
partial joint coordinates from the three coordinate planes separately which are
then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs
and improves the speed of VoxelPose by ten times and meanwhile achieves
competitive accuracy as the state-of-the-art methods, proving its potential in
real-time applications.Comment: 22 pages, 7 figures, submitted to ECCV 202