Search CORE

22 research outputs found

High-Fidelity Neural Human Motion Transfer from Monocular Video Computer Vision and Pattern Recognition

Author: Castillo S.
Elgharib M.
Golyanik V.
Henningson J.
Kappel M.
Magnor M.
Seidel H.
Theobalt C.
Publication venue
Publication date: 01/01/2021
Field of study

4DHumanOutfit: a multi-subject 4D dataset of human motion sequences in varying outfits exhibiting large displacements

Author: Armando Matthieu
Boissieux Laurence
Boyer Edmond
Franco Jean-Sebastien
Humenberger Martin
Legras Christophe
Leroy Vincent
Marsot Mathieu
Pansiot Julien
Pujades Sergi
Rekik Rim
Rogez Gregory
Swamy Anilkumar
Wuhrer Stefanie
Publication venue
Publication date: 12/06/2023
Field of study

This work presents 4DHumanOutfit, a new dataset of densely sampled spatio-temporal 4D human motion data of different actors, outfits and motions. The dataset is designed to contain different actors wearing different outfits while performing different motions in each outfit. In this way, the dataset can be seen as a cube of data containing 4D motion sequences along 3 axes with identity, outfit and motion. This rich dataset has numerous potential applications for the processing and creation of digital humans, e.g. augmented reality, avatar creation and virtual try on. 4DHumanOutfit is released for research purposes at https://kinovis.inria.fr/4dhumanoutfit/. In addition to image data and 4D reconstructions, the dataset includes reference solutions for each axis. We present independent baselines along each axis that demonstrate the value of these reference solutions for evaluation tasks

arXiv.org e-Print Archive

VGFlow: Visibility guided Flow Network for Human Reposing

Author: Ceylan Duygu
Hemani Mayur
Jain Rishabh
Krishnamurthy Balaji
Lu Jingwan
Sarkar Mausooom
Singh Krishna Kumar
Publication venue
Publication date: 26/11/2022
Field of study

The task of human reposing involves generating a realistic image of a person standing in an arbitrary conceivable pose. There are multiple difficulties in generating perceptually accurate images, and existing methods suffer from limitations in preserving texture, maintaining pattern coherence, respecting cloth boundaries, handling occlusions, manipulating skin generation, etc. These difficulties are further exacerbated by the fact that the possible space of pose orientation for humans is large and variable, the nature of clothing items is highly non-rigid, and the diversity in body shape differs largely among the population. To alleviate these difficulties and synthesize perceptually accurate images, we propose VGFlow. Our model uses a visibility-guided flow module to disentangle the flow into visible and invisible parts of the target for simultaneous texture preservation and style manipulation. Furthermore, to tackle distinct body shapes and avoid network artifacts, we also incorporate a self-supervised patch-wise "realness" loss to improve the output. VGFlow achieves state-of-the-art results as observed qualitatively and quantitatively on different image quality metrics (SSIM, LPIPS, FID).Comment: 9 pages, 18 figures, computer visio

arXiv.org e-Print Archive

Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis

Author: Li Zhoujun
Mei Tao
Sha Tong
Shen Tong
Zhang Wei
Publication venue
Publication date: 21/08/2023
Field of study

Deep person generation has attracted extensive research attention due to its wide applications in virtual agents, video conferencing, online shopping and art/movie production. With the advancement of deep learning, visual appearances (face, pose, cloth) of a person image can be easily generated or manipulated on demand. In this survey, we first summarize the scope of person generation, and then systematically review recent progress and technical trends in deep person generation, covering three major tasks: talking-head generation (face), pose-guided person generation (pose) and garment-oriented person generation (cloth). More than two hundred papers are covered for a thorough overview, and the milestone works are highlighted to witness the major technical breakthrough. Based on these fundamental tasks, a number of applications are investigated, e.g., virtual fitting, digital human, generative data augmentation. We hope this survey could shed some light on the future prospects of deep person generation, and provide a helpful foundation for full applications towards digital human

arXiv.org e-Print Archive

HumanGAN: A Generative Model of Humans Images

Author: Golyanik V.
Liu L.
Sarkar K.
Theobalt C.
Publication venue
Publication date: 01/01/2021
Field of study

Generative adversarial networks achieve great performance in photorealistic image synthesis in various domains, including human images. However, they usually employ latent vectors that encode the sampled outputs globally. This does not allow convenient control of semantically-relevant individual parts of the image, and is not able to draw samples that only differ in partial aspects, such as clothing style. We address these limitations and present a generative model for images of dressed humans offering control over pose, local body part appearance and garment style. This is the first method to solve various aspects of human image generation such as global appearance sampling, pose transfer, parts and garment transfer, and parts sampling jointly in a unified framework. As our model encodes part-based latent appearance vectors in a normalized pose-independent space and warps them to different poses, it preserves body and clothing appearance under varying posture. Experiments show that our flexible and general generative method outperforms task-specific baselines for pose-conditioned image generation, pose transfer and part sampling in terms of realism and output resolution

MPG.PuRe

High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions

Author: Choi Seunghwan
Choo Jaegul
Gu Gyojung
Lee Sangyun
Park Sunghyun
Publication venue
Publication date: 20/07/2022
Field of study

Image-based virtual try-on aims to synthesize an image of a person wearing a given clothing item. To solve the task, the existing methods warp the clothing item to fit the person's body and generate the segmentation map of the person wearing the item before fusing the item with the person. However, when the warping and the segmentation generation stages operate individually without information exchange, the misalignment between the warped clothes and the segmentation map occurs, which leads to the artifacts in the final image. The information disconnection also causes excessive warping near the clothing regions occluded by the body parts, so-called pixel-squeezing artifacts. To settle the issues, we propose a novel try-on condition generator as a unified module of the two stages (i.e., warping and segmentation generation stages). A newly proposed feature fusion block in the condition generator implements the information exchange, and the condition generator does not create any misalignment or pixel-squeezing artifacts. We also introduce discriminator rejection that filters out the incorrect segmentation map predictions and assures the performance of virtual try-on frameworks. Experiments on a high-resolution dataset demonstrate that our model successfully handles the misalignment and occlusion, and significantly outperforms the baselines. Code is available at https://github.com/sangyun884/HR-VITON.Comment: Accepted to ECCV 202

arXiv.org e-Print Archive

Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation

Author: Bernard Florian
Habermann Marc
Kim Hyeongwoo
Liu Lingjie
Theobalt Christian
Wang Wenping
Xu Weipeng
Zollhoefer Michael
Publication venue
Publication date: 01/01/2020
Field of study

Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively

arXiv.org e-Print Archive

MPG.PuRe