28 research outputs found
Unsupervised 3D Pose Estimation with Geometric Self-Supervision
We present an unsupervised learning approach to recover 3D human pose from 2D
skeletal joints extracted from a single image. Our method does not require any
multi-view image data, 3D skeletons, correspondences between 2D-3D points, or
use previously learned 3D priors during training. A lifting network accepts 2D
landmarks as inputs and generates a corresponding 3D skeleton estimate. During
training, the recovered 3D skeleton is reprojected on random camera viewpoints
to generate new "synthetic" 2D poses. By lifting the synthetic 2D poses back to
3D and re-projecting them in the original camera view, we can define
self-consistency loss both in 3D and in 2D. The training can thus be self
supervised by exploiting the geometric self-consistency of the
lift-reproject-lift process. We show that self-consistency alone is not
sufficient to generate realistic skeletons, however adding a 2D pose
discriminator enables the lifter to output valid 3D poses. Additionally, to
learn from 2D poses "in the wild", we train an unsupervised 2D domain adapter
network to allow for an expansion of 2D data. This improves results and
demonstrates the usefulness of 2D pose data for unsupervised 3D lifting.
Results on Human3.6M dataset for 3D human pose estimation demonstrate that our
approach improves upon the previous unsupervised methods by 30% and outperforms
many weakly supervised approaches that explicitly use 3D data
AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild
Animals are capable of extreme agility, yet understanding their complex
dynamics, which have ecological, biomechanical and evolutionary implications,
remains challenging. Being able to study this incredible agility will be
critical for the development of next-generation autonomous legged robots. In
particular, the cheetah (acinonyx jubatus) is supremely fast and maneuverable,
yet quantifying its whole-body 3D kinematic data during locomotion in the wild
remains a challenge, even with new deep learning-based methods. In this work we
present an extensive dataset of free-running cheetahs in the wild, called
AcinoSet, that contains 119,490 frames of multi-view synchronized high-speed
video footage, camera calibration files and 7,588 human-annotated frames. We
utilize markerless animal pose estimation to provide 2D keypoints. Then, we use
three methods that serve as strong baselines for 3D pose estimation tool
development: traditional sparse bundle adjustment, an Extended Kalman Filter,
and a trajectory optimization-based method we call Full Trajectory Estimation.
The resulting 3D trajectories, human-checked 3D ground truth, and an
interactive tool to inspect the data is also provided. We believe this dataset
will be useful for a diverse range of fields such as ecology, neuroscience,
robotics, biomechanics as well as computer vision.Comment: Code and data can be found at:
https://github.com/African-Robotics-Unit/AcinoSe