29 research outputs found
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Sim2real transfer learning for 3D human pose estimation: motion to the rescue
Synthetic visual data can provide practically infinite diversity and rich
labels, while avoiding ethical issues with privacy and bias. However, for many
tasks, current models trained on synthetic data generalize poorly to real data.
The task of 3D human pose estimation is a particularly interesting example of
this sim2real problem, because learning-based approaches perform reasonably
well given real training data, yet labeled 3D poses are extremely difficult to
obtain in the wild, limiting scalability. In this paper, we show that standard
neural-network approaches, which perform poorly when trained on synthetic RGB
images, can perform well when the data is pre-processed to extract cues about
the person's motion, notably as optical flow and the motion of 2D keypoints.
Therefore, our results suggest that motion can be a simple way to bridge a
sim2real gap when video is available. We evaluate on the 3D Poses in the Wild
dataset, the most challenging modern benchmark for 3D pose estimation, where we
show full 3D mesh recovery that is on par with state-of-the-art methods trained
on real 3D sequences, despite training only on synthetic humans from the
SURREAL dataset.Comment: Accepted at NeurIPS 201
Adjustable Method Based on Body Parts for Improving the Accuracy of 3D Reconstruction in Visually Important Body Parts from Silhouettes
This research proposes a novel adjustable algorithm for reconstructing 3D
body shapes from front and side silhouettes. Most recent silhouette-based
approaches use a deep neural network trained by silhouettes and key points to
estimate the shape parameters but cannot accurately fit the model to the body
contours and consequently are struggling to cover detailed body geometry,
especially in the torso. In addition, in most of these cases, body parts have
the same accuracy priority, making the optimization harder and avoiding
reaching the optimum possible result in essential body parts, like the torso,
which is visually important in most applications, such as virtual garment
fitting. In the proposed method, we can adjust the expected accuracy for each
body part based on our purpose by assigning coefficients for the distance of
each body part between the projected 3D body and 2D silhouettes. To measure
this distance, we first recognize the correspondent body parts using body
segmentation in both views. Then, we align individual body parts by 2D rigid
registration and match them using pairwise matching. The objective function
tries to minimize the distance cost for the individual body parts in both views
based on distances and coefficients by optimizing the statistical model
parameters. We also handle the slight variation in the degree of arms and limbs
by matching the pose. We evaluate the proposed method with synthetic body
meshes from the normalized S-SCAPE. The result shows that the algorithm can
more accurately reconstruct visually important body parts with high
coefficients.Comment: 16 pages, 17 image
Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction
We propose the Canonical 3D Deformer Map, a new representation of the 3D
shape of common object categories that can be learned from a collection of 2D
images of independent objects. Our method builds in a novel way on concepts
from parametric deformation models, non-parametric 3D reconstruction, and
canonical embeddings, combining their individual advantages. In particular, it
learns to associate each image pixel with a deformation model of the
corresponding 3D object point which is canonical, i.e. intrinsic to the
identity of the point and shared across objects of the category. The result is
a method that, given only sparse 2D supervision at training time, can, at test
time, reconstruct the 3D shape and texture of objects from single views, while
establishing meaningful dense correspondences between object instances. It also
achieves state-of-the-art results in dense 3D reconstruction on public
in-the-wild datasets of faces, cars, and birds.Comment: Published at NeurIPS 202