2,505 research outputs found
Learning to Dress {3D} People in Generative Clothing
Three-dimensional human body models are widely used in the analysis of human
pose and motion. Existing models, however, are learned from minimally-clothed
3D scans and thus do not generalize to the complexity of dressed people in
common images and videos. Additionally, current models lack the expressive
power needed to represent the complex non-linear geometry of pose-dependent
clothing shapes. To address this, we learn a generative 3D mesh model of
clothed people from 3D scans with varying pose and clothing. Specifically, we
train a conditional Mesh-VAE-GAN to learn the clothing deformation from the
SMPL body model, making clothing an additional term in SMPL. Our model is
conditioned on both pose and clothing type, giving the ability to draw samples
of clothing to dress different body shapes in a variety of styles and poses. To
preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to
3D meshes. Our model, named CAPE, represents global shape and fine local
structure, effectively extending the SMPL body model to clothing. To our
knowledge, this is the first generative model that directly dresses 3D human
body meshes and generalizes to different poses. The model, code and data are
available for research purposes at https://cape.is.tue.mpg.de.Comment: CVPR-2020 camera ready. Code and data are available at
https://cape.is.tue.mpg.d
Gait recognition and understanding based on hierarchical temporal memory using 3D gait semantic folding
Gait recognition and understanding systems have shown a wide-ranging application prospect. However, their use of unstructured data from image and video has affected their performance, e.g., they are easily influenced by multi-views, occlusion, clothes, and object carrying conditions. This paper addresses these problems using a realistic 3-dimensional (3D) human structural data and sequential pattern learning framework with top-down attention modulating mechanism based on Hierarchical Temporal Memory (HTM). First, an accurate 2-dimensional (2D) to 3D human body pose and shape semantic parameters estimation method is proposed, which exploits the advantages of an instance-level body parsing model and a virtual dressing method. Second, by using gait semantic folding, the estimated body parameters are encoded using a sparse 2D matrix to construct the structural gait semantic image. In order to achieve time-based gait recognition, an HTM Network is constructed to obtain the sequence-level gait sparse distribution representations (SL-GSDRs). A top-down attention mechanism is introduced to deal with various conditions including multi-views by refining the SL-GSDRs, according to prior knowledge. The proposed gait learning model not only aids gait recognition tasks to overcome the difficulties in real application scenarios but also provides the structured gait semantic images for visual cognition. Experimental analyses on CMU MoBo, CASIA B, TUM-IITKGP, and KY4D datasets show a significant performance gain in terms of accuracy and robustness
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis
Deep person generation has attracted extensive research attention due to its
wide applications in virtual agents, video conferencing, online shopping and
art/movie production. With the advancement of deep learning, visual appearances
(face, pose, cloth) of a person image can be easily generated or manipulated on
demand. In this survey, we first summarize the scope of person generation, and
then systematically review recent progress and technical trends in deep person
generation, covering three major tasks: talking-head generation (face),
pose-guided person generation (pose) and garment-oriented person generation
(cloth). More than two hundred papers are covered for a thorough overview, and
the milestone works are highlighted to witness the major technical
breakthrough. Based on these fundamental tasks, a number of applications are
investigated, e.g., virtual fitting, digital human, generative data
augmentation. We hope this survey could shed some light on the future prospects
of deep person generation, and provide a helpful foundation for full
applications towards digital human
- …