10,664 research outputs found
VIBE: Video Inference for Human Body Pose and Shape Estimation
Human motion is fundamental to understanding behavior. Despite progress on
single-image 3D pose and shape estimation, existing video-based
state-of-the-art methods fail to produce accurate and natural motion sequences
due to a lack of ground-truth 3D motion data for training. To address this
problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE),
which makes use of an existing large-scale motion capture dataset (AMASS)
together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty
is an adversarial learning framework that leverages AMASS to discriminate
between real human motions and those produced by our temporal pose and shape
regression networks. We define a temporal network architecture and show that
adversarial training, at the sequence level, produces kinematically plausible
motion sequences without in-the-wild ground-truth 3D labels. We perform
extensive experimentation to analyze the importance of motion and demonstrate
the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving
state-of-the-art performance. Code and pretrained models are available at
https://github.com/mkocabas/VIBE.Comment: CVPR-2020 camera ready. Code is available at
https://github.com/mkocabas/VIB
Learning 3D Human Pose from Structure and Motion
3D human pose estimation from a single image is a challenging problem,
especially for in-the-wild settings due to the lack of 3D annotated data. We
propose two anatomically inspired loss functions and use them with a
weakly-supervised learning framework to jointly learn from large-scale
in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal
network that exploits temporal and structural cues present in predicted pose
sequences to temporally harmonize the pose estimations. We carefully analyze
the proposed contributions through loss surface visualizations and sensitivity
analysis to facilitate deeper understanding of their working mechanism. Our
complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M
and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics
card.Comment: ECCV 2018. Project page: https://www.cse.iitb.ac.in/~rdabral/3DPose
- …