8,799 research outputs found
Estimation of Human Body Shape and Posture Under Clothing
Estimating the body shape and posture of a dressed human subject in motion
represented as a sequence of (possibly incomplete) 3D meshes is important for
virtual change rooms and security. To solve this problem, statistical shape
spaces encoding human body shape and posture variations are commonly used to
constrain the search space for the shape estimate. In this work, we propose a
novel method that uses a posture-invariant shape space to model body shape
variation combined with a skeleton-based deformation to model posture
variation. Our method can estimate the body shape and posture of both static
scans and motion sequences of dressed human body scans. In case of motion
sequences, our method takes advantage of motion cues to solve for a single body
shape estimate along with a sequence of posture estimates. We apply our
approach to both static scans and motion sequences and demonstrate that using
our method, higher fitting accuracy is achieved than when using a variant of
the popular SCAPE model as statistical model.Comment: 23 pages, 11 figure
Learning to Dress {3D} People in Generative Clothing
Three-dimensional human body models are widely used in the analysis of human
pose and motion. Existing models, however, are learned from minimally-clothed
3D scans and thus do not generalize to the complexity of dressed people in
common images and videos. Additionally, current models lack the expressive
power needed to represent the complex non-linear geometry of pose-dependent
clothing shapes. To address this, we learn a generative 3D mesh model of
clothed people from 3D scans with varying pose and clothing. Specifically, we
train a conditional Mesh-VAE-GAN to learn the clothing deformation from the
SMPL body model, making clothing an additional term in SMPL. Our model is
conditioned on both pose and clothing type, giving the ability to draw samples
of clothing to dress different body shapes in a variety of styles and poses. To
preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to
3D meshes. Our model, named CAPE, represents global shape and fine local
structure, effectively extending the SMPL body model to clothing. To our
knowledge, this is the first generative model that directly dresses 3D human
body meshes and generalizes to different poses. The model, code and data are
available for research purposes at https://cape.is.tue.mpg.de.Comment: CVPR-2020 camera ready. Code and data are available at
https://cape.is.tue.mpg.d
End-to-end Recovery of Human Shape and Pose
We describe Human Mesh Recovery (HMR), an end-to-end framework for
reconstructing a full 3D mesh of a human body from a single RGB image. In
contrast to most current methods that compute 2D or 3D joint locations, we
produce a richer and more useful mesh representation that is parameterized by
shape and 3D joint angles. The main objective is to minimize the reprojection
loss of keypoints, which allow our model to be trained using images in-the-wild
that only have ground truth 2D annotations. However, the reprojection loss
alone leaves the model highly under constrained. In this work we address this
problem by introducing an adversary trained to tell whether a human body
parameter is real or not using a large database of 3D human meshes. We show
that HMR can be trained with and without using any paired 2D-to-3D supervision.
We do not rely on intermediate 2D keypoint detections and infer 3D pose and
shape parameters directly from image pixels. Our model runs in real-time given
a bounding box containing the person. We demonstrate our approach on various
images in-the-wild and out-perform previous optimization based methods that
output 3D meshes and show competitive results on tasks such as 3D joint
location estimation and part segmentation.Comment: CVPR 2018, Project page with code: https://akanazawa.github.io/hmr
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Integrating body scanning solutions into virtual dressing rooms
The world is entering its 4th Industrial Revolution, a new era of manufacturing characterized
by ubiquitous digitization and computing. One industry to benefit and grow from this
revolution is the fashion industry, in which Europe (and Italy in particular) has long
maintained a global lead. To evolve with the changes in technology, we developed the IT-
SHIRT project. In the context of this project, a key challenge relies on developing a virtual
dressing room in which the final users (customers) can virtually try different clothes on their
bodies. In this paper, we tackle the aforementioned issue by providing a critical analysis of
the existing body scanning solutions, identifying their strengths and weaknesses towards
their integration within the pipeline of virtual dressing rooms
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach
Unsupervised Person Image Synthesis in Arbitrary Poses
We present a novel approach for synthesizing photo-realistic images of people
in arbitrary poses using generative adversarial learning. Given an input image
of a person and a desired pose represented by a 2D skeleton, our model renders
the image of the same person under the new pose, synthesizing novel views of
the parts visible in the input image and hallucinating those that are not seen.
This problem has recently been addressed in a supervised manner, i.e., during
training the ground truth images under the new poses are given to the network.
We go beyond these approaches by proposing a fully unsupervised strategy. We
tackle this challenging scenario by splitting the problem into two principal
subtasks. First, we consider a pose conditioned bidirectional generator that
maps back the initially rendered image to the original pose, hence being
directly comparable to the input image without the need to resort to any
training image. Second, we devise a novel loss function that incorporates
content and style terms, and aims at producing images of high perceptual
quality. Extensive experiments conducted on the DeepFashion dataset demonstrate
that the images rendered by our model are very close in appearance to those
obtained by fully supervised approaches.Comment: Accepted as Spotlight at CVPR 201
Unsupervised person image synthesis in arbitrary poses
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWe present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.Peer ReviewedPostprint (author's final draft
- …