29 research outputs found

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    Sim2real transfer learning for 3D human pose estimation: motion to the rescue

    Full text link
    Synthetic visual data can provide practically infinite diversity and rich labels, while avoiding ethical issues with privacy and bias. However, for many tasks, current models trained on synthetic data generalize poorly to real data. The task of 3D human pose estimation is a particularly interesting example of this sim2real problem, because learning-based approaches perform reasonably well given real training data, yet labeled 3D poses are extremely difficult to obtain in the wild, limiting scalability. In this paper, we show that standard neural-network approaches, which perform poorly when trained on synthetic RGB images, can perform well when the data is pre-processed to extract cues about the person's motion, notably as optical flow and the motion of 2D keypoints. Therefore, our results suggest that motion can be a simple way to bridge a sim2real gap when video is available. We evaluate on the 3D Poses in the Wild dataset, the most challenging modern benchmark for 3D pose estimation, where we show full 3D mesh recovery that is on par with state-of-the-art methods trained on real 3D sequences, despite training only on synthetic humans from the SURREAL dataset.Comment: Accepted at NeurIPS 201

    Adjustable Method Based on Body Parts for Improving the Accuracy of 3D Reconstruction in Visually Important Body Parts from Silhouettes

    Full text link
    This research proposes a novel adjustable algorithm for reconstructing 3D body shapes from front and side silhouettes. Most recent silhouette-based approaches use a deep neural network trained by silhouettes and key points to estimate the shape parameters but cannot accurately fit the model to the body contours and consequently are struggling to cover detailed body geometry, especially in the torso. In addition, in most of these cases, body parts have the same accuracy priority, making the optimization harder and avoiding reaching the optimum possible result in essential body parts, like the torso, which is visually important in most applications, such as virtual garment fitting. In the proposed method, we can adjust the expected accuracy for each body part based on our purpose by assigning coefficients for the distance of each body part between the projected 3D body and 2D silhouettes. To measure this distance, we first recognize the correspondent body parts using body segmentation in both views. Then, we align individual body parts by 2D rigid registration and match them using pairwise matching. The objective function tries to minimize the distance cost for the individual body parts in both views based on distances and coefficients by optimizing the statistical model parameters. We also handle the slight variation in the degree of arms and limbs by matching the pose. We evaluate the proposed method with synthetic body meshes from the normalized S-SCAPE. The result shows that the algorithm can more accurately reconstruct visually important body parts with high coefficients.Comment: 16 pages, 17 image

    Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

    Get PDF
    We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.Comment: Published at NeurIPS 202
    corecore