6 research outputs found
3D Body Shapes Estimation from Dressed-Human Silhouettes
Estimation of 3D body shapes from dressed-human photos is an important but challenging problem in virtual fitting. We propose a novel automatic framework to efficiently estimate 3D body shapes under clothes. We construct a database of 3D naked and dressed body pairs, based on which we learn how to predict 3D positions of body landmarks (which further constrain a parametric human body model) automatically according to dressed-human silhouettes. Critical vertices are selected on 3D registered human bodies as landmarks to represent body shapes, so as to avoid the time-consuming vertices correspondences finding process for parametric body reconstruction. Our method can estimate 3D body shapes from dressed-human silhouettes within 4 seconds, while the fastest method reported previously need 1 minute. In addition, our estimation error is within the size tolerance for clothing industry. We dress 6042 naked bodies with 3 sets of common clothes by physically based cloth simulation technique. To the best of our knowledge, We are the first to construct such a database containing 3D naked and dressed body pairs and our database may contribute to the areas of human body shapes estimation and cloth simulation
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Concise and Effective Network for 3D Human Modeling from Orthogonal Silhouettes
In this paper, we revisit the problem of 3D human modeling from two
orthogonal silhouettes of individuals (i.e., front and side views). Different
from our prior work {\cite{wang2003virtual}}, a supervised learning approach
based on \textit{convolutional neural network} (CNN) is investigated to solve
the problem by establishing a mapping function that can effectively extract
features from two silhouettes and fuse them into coefficients in the shape
space of human bodies. A new CNN structure is proposed in our work to exact not
only the discriminative features of front and side views and also their mixed
features for the mapping function. 3D human models with high accuracy are
synthesized from coefficients generated by the mapping function. Existing CNN
approaches for 3D human modeling usually learn a large number of parameters
(from {8.5M} to {355.4M}) from two binary images. Differently, we investigate a
new network architecture and conduct the samples on silhouettes as input. As a
consequence, more accurate models can be generated by our network with only
{2.4M} coefficients. The training of our network is conducted on samples
obtained by augmenting a publicly accessible dataset. Learning transfer by
using datasets with a smaller number of scanned models is applied to our
network to enable the function of generating results with gender-oriented (or
geographical) patterns
Thermal-Kinect Fusion Scanning System for Bodyshape Inpainting and Estimation under Clothing
In today\u27s interactive world 3D body scanning is necessary in the field of making virtual avatar, apparel industry, physical health assessment and so on. 3D scanners that are used in this process are very costly and also requires subject to be nearly naked or wear a special tight fitting cloths. A cost effective 3D body scanning system which can estimate body parameters under clothing will be the best solution in this regard. In our experiment we build such a body scanning system by fusing Kinect depth sensor and a Thermal camera. Kinect can sense the depth of the subject and create a 3D point cloud out of it. Thermal camera can sense the body heat of a person under clothing. Fusing these two sensors\u27 images could produce a thermal mapped 3D point cloud of the subject and from that body parameters could be estimated even under various cloths. Moreover, this fusion system is also a cost effective one. In our experiment, we introduce a new pipeline for working with our fusion scanning system, and estimate and recover body shape under clothing. We capture Thermal-Kinect fusion images of the subjects with different clothing and produce both full and partial 3D point clouds. To recover the missing parts from our low resolution scan we fit parametric human model on our images and perform boolean operations with our scan data. Further, we measure our final 3D point cloud scan to estimate the body parameters and compare it with the ground truth. We achieve a minimum average error rate of 0.75 cm comparing to other approaches