We propose a CNN-based approach for multi-camera markerless motion capture of
the human body. Unlike existing methods that first perform pose estimation on
individual cameras and generate 3D models as post-processing, our approach
makes use of 3D reasoning throughout a multi-stage approach. This novelty
allows us to use provisional 3D models of human pose to rethink where the
joints should be located in the image and to recover from past mistakes. Our
principled refinement of 3D human poses lets us make use of image cues, even
from images where we previously misdetected joints, to refine our estimates as
part of an end-to-end approach. Finally, we demonstrate how the high-quality
output of our multi-camera setup can be used as an additional training source
to improve the accuracy of existing single camera models.Comment: International Conference on 3DVision (3dv