750 research outputs found
Harvesting Multiple Views for Marker-less 3D Human Pose Annotations
Recent advances with Convolutional Networks (ConvNets) have shifted the
bottleneck for many computer vision tasks to annotated data collection. In this
paper, we present a geometry-driven approach to automatically collect
annotations for human pose prediction tasks. Starting from a generic ConvNet
for 2D human pose, and assuming a multi-view setup, we describe an automatic
way to collect accurate 3D human pose annotations. We capitalize on constraints
offered by the 3D geometry of the camera setup and the 3D structure of the
human body to probabilistically combine per view 2D ConvNet predictions into a
globally optimal 3D pose. This 3D pose is used as the basis for harvesting
annotations. The benefit of the annotations produced automatically with our
approach is demonstrated in two challenging settings: (i) fine-tuning a generic
ConvNet-based 2D pose predictor to capture the discriminative aspects of a
subject's appearance (i.e.,"personalization"), and (ii) training a ConvNet from
scratch for single view 3D human pose prediction without leveraging 3D pose
groundtruth. The proposed multi-view pose estimator achieves state-of-the-art
results on standard benchmarks, demonstrating the effectiveness of our method
in exploiting the available multi-view information.Comment: CVPR 2017 Camera Read
Learning Monocular 3D Human Pose Estimation from Multi-view Images
Accurate 3D human pose estimation from single images is possible with
sophisticated deep-net architectures that have been trained on very large
datasets. However, this still leaves open the problem of capturing motions for
which no such database exists. Manual annotation is tedious, slow, and
error-prone. In this paper, we propose to replace most of the annotations by
the use of multiple views, at training time only. Specifically, we train the
system to predict the same pose in all views. Such a consistency constraint is
necessary but not sufficient to predict accurate poses. We therefore complement
it with a supervised loss aiming to predict the correct pose in a small set of
labeled images, and with a regularization term that penalizes drift from
initial predictions. Furthermore, we propose a method to estimate camera pose
jointly with human pose, which lets us utilize multi-view footage where
calibration is difficult, e.g., for pan-tilt or moving handheld cameras. We
demonstrate the effectiveness of our approach on established benchmarks, as
well as on a new Ski dataset with rotating cameras and expert ski motion, for
which annotations are truly hard to obtain.Comment: CVPR 2018, Ski-Pose PTZ-Camera Dataset availabl
- …