6,254 research outputs found
VITON: An Image-based Virtual Try-on Network
We present an image-based VIirtual Try-On Network (VITON) without using 3D
information in any form, which seamlessly transfers a desired clothing item
onto the corresponding region of a person using a coarse-to-fine strategy.
Conditioned upon a new clothing-agnostic yet descriptive person representation,
our framework first generates a coarse synthesized image with the target
clothing item overlaid on that same person in the same pose. We further enhance
the initial blurry clothing area with a refinement network. The network is
trained to learn how much detail to utilize from the target clothing item, and
where to apply to the person in order to synthesize a photo-realistic image in
which the target item deforms naturally with clear visual patterns. Experiments
on our newly collected Zalando dataset demonstrate its promise in the
image-based virtual try-on task over state-of-the-art generative models
Discovering fair representations in the data domain
Interpretability and fairness are critical in computer vision and machine learning applications, in particular when dealing with human outcomes, e.g. inviting or not inviting for a job interview based on application materials that may include photographs. One promising direction to achieve fairness is by learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair outcomes. All available models however learn latent embeddings which comes at the cost of being uninterpretable. We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced. Here the data domain can be images, or any tabular data representation. This task would be straightforward if we had fair target data available, but this is not the case. To overcome this, we learn a highly unconstrained mapping by exploiting statistics of residuals -- the difference between input data and its translated version -- and the protected characteristics. When applied to the CelebA dataset of face images with gender attribute as the protected characteristic, our model enforces equality of opportunity by adjusting the eyes and lips regions. Intriguingly, on the same dataset we arrive at similar conclusions when using semantic attribute representations of images for translation. On face images of the recent DiF dataset, with the same gender attribute, our method adjusts nose regions. In the Adult income dataset, also with protected gender attribute, our model achieves equality of opportunity by, among others, obfuscating the wife and husband relationship. Analyzing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and prediction performance
StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
Our paper seeks to transfer the hairstyle of a reference image to an input
photo for virtual hair try-on. We target a variety of challenges scenarios,
such as transforming a long hairstyle with bangs to a pixie cut, which requires
removing the existing hair and inferring how the forehead would look, or
transferring partially visible hair from a hat-wearing person in a different
pose. Past solutions leverage StyleGAN for hallucinating any missing parts and
producing a seamless face-hair composite through so-called GAN inversion or
projection. However, there remains a challenge in controlling the
hallucinations to accurately transfer hairstyle and preserve the face shape and
identity of the input. To overcome this, we propose a multi-view optimization
framework that uses "two different views" of reference composites to
semantically guide occluded or ambiguous regions. Our optimization shares
information between two poses, which allows us to produce high fidelity and
realistic results from incomplete references. Our framework produces
high-quality results and outperforms prior work in a user study that consists
of significantly more challenging hair transfer scenarios than previously
studied. Project page: https://stylegan-salon.github.io/.Comment: Accepted to CVPR202
SD-GAN: Semantic Decomposition for Face Image Synthesis with Discrete Attribute
Manipulating latent code in generative adversarial networks (GANs) for facial
image synthesis mainly focuses on continuous attribute synthesis (e.g., age,
pose and emotion), while discrete attribute synthesis (like face mask and
eyeglasses) receives less attention. Directly applying existing works to facial
discrete attributes may cause inaccurate results. In this work, we propose an
innovative framework to tackle challenging facial discrete attribute synthesis
via semantic decomposing, dubbed SD-GAN. To be concrete, we explicitly
decompose the discrete attribute representation into two components, i.e. the
semantic prior basis and offset latent representation. The semantic prior basis
shows an initializing direction for manipulating face representation in the
latent space. The offset latent presentation obtained by 3D-aware semantic
fusion network is proposed to adjust prior basis. In addition, the fusion
network integrates 3D embedding for better identity preservation and discrete
attribute synthesis. The combination of prior basis and offset latent
representation enable our method to synthesize photo-realistic face images with
discrete attributes. Notably, we construct a large and valuable dataset MEGN
(Face Mask and Eyeglasses images crawled from Google and Naver) for completing
the lack of discrete attributes in the existing dataset. Extensive qualitative
and quantitative experiments demonstrate the state-of-the-art performance of
our method. Our code is available at: https://github.com/MontaEllis/SD-GAN.Comment: 16 pages, 12 figures, Accepted by ACM MM202
- …