591 research outputs found
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Image-based rendering and synthesis
Multiview imaging (MVI) is currently the focus of some research as it has a wide range of applications and opens up research in other topics and applications, including virtual view synthesis for three-dimensional (3D) television (3DTV) and entertainment. However, a large amount of storage is needed by multiview systems and are difficult to construct. The concept behind allowing 3D scenes and objects to be visualized in a realistic way without full 3D model reconstruction is image-based rendering (IBR). Using images as the primary substrate, IBR has many potential applications including for video games, virtual travel and others. The technique creates new views of scenes which are reconstructed from a collection of densely sampled images or videos. The IBR concept has different classification such as knowing 3D models and the lighting conditions and be rendered using conventional graphic techniques. Another is lightfield or lumigraph rendering which depends on dense sampling with no or very little geometry for rendering without recovering the exact 3D-models.published_or_final_versio
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
To enable machines to learn how humans interact with the physical world in
our daily activities, it is crucial to provide rich data that encompasses the
3D motion of humans as well as the motion of objects in a learnable 3D
representation. Ideally, this data should be collected in a natural setup,
capturing the authentic dynamic 3D signals during human-object interactions. To
address this challenge, we introduce the ParaHome system, designed to capture
and parameterize dynamic 3D movements of humans and objects within a common
home environment. Our system consists of a multi-view setup with 70
synchronized RGB cameras, as well as wearable motion capture devices equipped
with an IMU-based body suit and hand motion capture gloves. By leveraging the
ParaHome system, we collect a novel large-scale dataset of human-object
interaction. Notably, our dataset offers key advancement over existing datasets
in three main aspects: (1) capturing 3D body and dexterous hand manipulation
motion alongside 3D object movement within a contextual home environment during
natural activities; (2) encompassing human interaction with multiple objects in
various episodic scenarios with corresponding descriptions in texts; (3)
including articulated objects with multiple parts expressed with parameterized
articulations. Building upon our dataset, we introduce new research tasks aimed
at building a generative model for learning and synthesizing human-object
interactions in a real-world room setting
06241 Abstracts Collection -- Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop
From 11.06.06 to 16.06.06, the Dagstuhl Seminar 06241 ``Human Motion - Understanding, Modeling, Capture and Animation. 13th Workshop "Theoretical Foundations of Computer Vision"\u27\u27 was held
in the International Conference and Research Center (IBFI),
Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general
Self-correction of 3D reconstruction from multi-view stereo images
We present a self-correction approach to improving the
3D reconstruction of a multi-view 3D photogrammetry system.
The self-correction approach has been able to repair
the reconstructed 3D surface damaged by depth discontinuities.
Due to self-occlusion, multi-view range images
have to be acquired and integrated into a watertight nonredundant
mesh model in order to cover the extended surface
of an imaged object. The integrated surface often suffers
from âdentâ artifacts produced by depth discontinuities
in the multi-view range images. In this paper we propose
a novel approach to correcting the 3D integrated surface
such that the dent artifacts can be repaired automatically.
We show examples of 3D reconstruction to demonstrate the
improvement that can be achieved by the self-correction
approach. This self-correction approach can be extended
to integrate range images obtained from alternative range
capture devices
- âŠ