22 research outputs found
4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements
This work presents 4DHumanOutfit, a new dataset of densely sampled
spatio-temporal 4D human motion data of different actors, outfits and motions.
The dataset is designed to contain different actors wearing different outfits
while performing different motions in each outfit. In this way, the dataset can
be seen as a cube of data containing 4D motion sequences along 3 axes with
identity, outfit and motion. This rich dataset has numerous potential
applications for the processing and creation of digital humans, e.g. augmented
reality, avatar creation and virtual try on. 4DHumanOutfit is released for
research purposes at https://kinovis.inria.fr/4dhumanoutfit/. In addition to
image data and 4D reconstructions, the dataset includes reference solutions for
each axis. We present independent baselines along each axis that demonstrate
the value of these reference solutions for evaluation tasks
VGFlow: Visibility guided Flow Network for Human Reposing
The task of human reposing involves generating a realistic image of a person
standing in an arbitrary conceivable pose. There are multiple difficulties in
generating perceptually accurate images, and existing methods suffer from
limitations in preserving texture, maintaining pattern coherence, respecting
cloth boundaries, handling occlusions, manipulating skin generation, etc. These
difficulties are further exacerbated by the fact that the possible space of
pose orientation for humans is large and variable, the nature of clothing items
is highly non-rigid, and the diversity in body shape differs largely among the
population. To alleviate these difficulties and synthesize perceptually
accurate images, we propose VGFlow. Our model uses a visibility-guided flow
module to disentangle the flow into visible and invisible parts of the target
for simultaneous texture preservation and style manipulation. Furthermore, to
tackle distinct body shapes and avoid network artifacts, we also incorporate a
self-supervised patch-wise "realness" loss to improve the output. VGFlow
achieves state-of-the-art results as observed qualitatively and quantitatively
on different image quality metrics (SSIM, LPIPS, FID).Comment: 9 pages, 18 figures, computer visio
Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis
Deep person generation has attracted extensive research attention due to its
wide applications in virtual agents, video conferencing, online shopping and
art/movie production. With the advancement of deep learning, visual appearances
(face, pose, cloth) of a person image can be easily generated or manipulated on
demand. In this survey, we first summarize the scope of person generation, and
then systematically review recent progress and technical trends in deep person
generation, covering three major tasks: talking-head generation (face),
pose-guided person generation (pose) and garment-oriented person generation
(cloth). More than two hundred papers are covered for a thorough overview, and
the milestone works are highlighted to witness the major technical
breakthrough. Based on these fundamental tasks, a number of applications are
investigated, e.g., virtual fitting, digital human, generative data
augmentation. We hope this survey could shed some light on the future prospects
of deep person generation, and provide a helpful foundation for full
applications towards digital human
HumanGAN: A Generative Model of Humans Images
Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution
High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions
Image-based virtual try-on aims to synthesize an image of a person wearing a
given clothing item. To solve the task, the existing methods warp the clothing
item to fit the person's body and generate the segmentation map of the person
wearing the item before fusing the item with the person. However, when the
warping and the segmentation generation stages operate individually without
information exchange, the misalignment between the warped clothes and the
segmentation map occurs, which leads to the artifacts in the final image. The
information disconnection also causes excessive warping near the clothing
regions occluded by the body parts, so-called pixel-squeezing artifacts. To
settle the issues, we propose a novel try-on condition generator as a unified
module of the two stages (i.e., warping and segmentation generation stages). A
newly proposed feature fusion block in the condition generator implements the
information exchange, and the condition generator does not create any
misalignment or pixel-squeezing artifacts. We also introduce discriminator
rejection that filters out the incorrect segmentation map predictions and
assures the performance of virtual try-on frameworks. Experiments on a
high-resolution dataset demonstrate that our model successfully handles the
misalignment and occlusion, and significantly outperforms the baselines. Code
is available at https://github.com/sangyun884/HR-VITON.Comment: Accepted to ECCV 202
Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation
Synthesizing realistic videos of humans using neural networks has been a
popular alternative to the conventional graphics-based rendering pipeline due
to its high efficiency. Existing works typically formulate this as an
image-to-image translation problem in 2D screen space, which leads to artifacts
such as over-smoothing, missing body parts, and temporal instability of
fine-scale detail, such as pose-dependent wrinkles in the clothing. In this
paper, we propose a novel human video synthesis method that approaches these
limiting factors by explicitly disentangling the learning of time-coherent
fine-scale details from the embedding of the human in 2D screen space. More
specifically, our method relies on the combination of two convolutional neural
networks (CNNs). Given the pose information, the first CNN predicts a dynamic
texture map that contains time-coherent high-frequency details, and the second
CNN conditions the generation of the final video on the temporally coherent
output of the first CNN. We demonstrate several applications of our approach,
such as human reenactment and novel view synthesis from monocular video, where
we show significant improvement over the state of the art both qualitatively
and quantitatively