82,128 research outputs found
VITON: An Image-based Virtual Try-on Network
We present an image-based VIirtual Try-On Network (VITON) without using 3D
information in any form, which seamlessly transfers a desired clothing item
onto the corresponding region of a person using a coarse-to-fine strategy.
Conditioned upon a new clothing-agnostic yet descriptive person representation,
our framework first generates a coarse synthesized image with the target
clothing item overlaid on that same person in the same pose. We further enhance
the initial blurry clothing area with a refinement network. The network is
trained to learn how much detail to utilize from the target clothing item, and
where to apply to the person in order to synthesize a photo-realistic image in
which the target item deforms naturally with clear visual patterns. Experiments
on our newly collected Zalando dataset demonstrate its promise in the
image-based virtual try-on task over state-of-the-art generative models
Unsupervised Person Image Synthesis in Arbitrary Poses
We present a novel approach for synthesizing photo-realistic images of people
in arbitrary poses using generative adversarial learning. Given an input image
of a person and a desired pose represented by a 2D skeleton, our model renders
the image of the same person under the new pose, synthesizing novel views of
the parts visible in the input image and hallucinating those that are not seen.
This problem has recently been addressed in a supervised manner, i.e., during
training the ground truth images under the new poses are given to the network.
We go beyond these approaches by proposing a fully unsupervised strategy. We
tackle this challenging scenario by splitting the problem into two principal
subtasks. First, we consider a pose conditioned bidirectional generator that
maps back the initially rendered image to the original pose, hence being
directly comparable to the input image without the need to resort to any
training image. Second, we devise a novel loss function that incorporates
content and style terms, and aims at producing images of high perceptual
quality. Extensive experiments conducted on the DeepFashion dataset demonstrate
that the images rendered by our model are very close in appearance to those
obtained by fully supervised approaches.Comment: Accepted as Spotlight at CVPR 201
Unsupervised person image synthesis in arbitrary poses
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWe present a novel approach for synthesizing photo-realistic images of people in arbitrary poses using generative adversarial learning. Given an input image of a person and a desired pose represented by a 2D skeleton, our model renders the image of the same person under the new pose, synthesizing novel views of the parts visible in the input image and hallucinating those that are not seen. This problem has recently been addressed in a supervised manner, i.e., during training the ground truth images under the new poses are given to the network. We go beyond these approaches by proposing a fully unsupervised strategy. We tackle this challenging scenario by splitting the problem into two principal subtasks. First, we consider a pose conditioned bidirectional generator that maps back the initially rendered image to the original pose, hence being directly comparable to the input image without the need to resort to any training image. Second, we devise a novel loss function that incorporates content and style terms, and aims at producing images of high perceptual quality. Extensive experiments conducted on the DeepFashion dataset demonstrate that the images rendered by our model are very close in appearance to those obtained by fully supervised approaches.Peer ReviewedPostprint (author's final draft
Learn to synthesize and synthesize to learn
Attribute guided face image synthesis aims to manipulate attributes on a face
image. Most existing methods for image-to-image translation can either perform
a fixed translation between any two image domains using a single attribute or
require training data with the attributes of interest for each subject.
Therefore, these methods could only train one specific model for each pair of
image domains, which limits their ability in dealing with more than two
domains. Another disadvantage of these methods is that they often suffer from
the common problem of mode collapse that degrades the quality of the generated
images. To overcome these shortcomings, we propose attribute guided face image
generation method using a single model, which is capable to synthesize multiple
photo-realistic face images conditioned on the attributes of interest. In
addition, we adopt the proposed model to increase the realism of the simulated
face images while preserving the face characteristics. Compared to existing
models, synthetic face images generated by our method present a good
photorealistic quality on several face datasets. Finally, we demonstrate that
generated facial images can be used for synthetic data augmentation, and
improve the performance of the classifier used for facial expression
recognition.Comment: Accepted to Computer Vision and Image Understanding (CVIU
Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis
Cross-domain synthesizing realistic faces to learn deep models has attracted
increasing attention for facial expression analysis as it helps to improve the
performance of expression recognition accuracy despite having small number of
real training images. However, learning from synthetic face images can be
problematic due to the distribution discrepancy between low-quality synthetic
images and real face images and may not achieve the desired performance when
the learned model applies to real world scenarios. To this end, we propose a
new attribute guided face image synthesis to perform a translation between
multiple image domains using a single model. In addition, we adopt the proposed
model to learn from synthetic faces by matching the feature distributions
between different domains while preserving each domain's characteristics. We
evaluate the effectiveness of the proposed approach on several face datasets on
generating realistic face images. We demonstrate that the expression
recognition performance can be enhanced by benefiting from our face synthesis
model. Moreover, we also conduct experiments on a near-infrared dataset
containing facial expression videos of drivers to assess the performance using
in-the-wild data for driver emotion recognition.Comment: 8 pages, 8 figures, 5 tables, accepted by FG 2019. arXiv admin note:
substantial text overlap with arXiv:1905.0028
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
- …