1,030 research outputs found
VITON: An Image-based Virtual Try-on Network
We present an image-based VIirtual Try-On Network (VITON) without using 3D
information in any form, which seamlessly transfers a desired clothing item
onto the corresponding region of a person using a coarse-to-fine strategy.
Conditioned upon a new clothing-agnostic yet descriptive person representation,
our framework first generates a coarse synthesized image with the target
clothing item overlaid on that same person in the same pose. We further enhance
the initial blurry clothing area with a refinement network. The network is
trained to learn how much detail to utilize from the target clothing item, and
where to apply to the person in order to synthesize a photo-realistic image in
which the target item deforms naturally with clear visual patterns. Experiments
on our newly collected Zalando dataset demonstrate its promise in the
image-based virtual try-on task over state-of-the-art generative models
Single Stage Multi-Pose Virtual Try-On
Multi-pose virtual try-on (MPVTON) aims to fit a target garment onto a person
at a target pose. Compared to traditional virtual try-on (VTON) that fits the
garment but keeps the pose unchanged, MPVTON provides a better try-on
experience, but is also more challenging due to the dual garment and pose
editing objectives. Existing MPVTON methods adopt a pipeline comprising three
disjoint modules including a target semantic layout prediction module, a coarse
try-on image generator and a refinement try-on image generator. These models
are trained separately, leading to sub-optimal model training and
unsatisfactory results. In this paper, we propose a novel single stage model
for MPVTON. Key to our model is a parallel flow estimation module that predicts
the flow fields for both person and garment images conditioned on the target
pose. The predicted flows are subsequently used to warp the appearance feature
maps of the person and the garment images to construct a style map. The map is
then used to modulate the target pose's feature map for target try-on image
generation. With the parallel flow estimation design, our model can be trained
end-to-end in a single stage and is more computationally efficient, resulting
in new SOTA performance on existing MPVTON benchmarks. We further introduce
multi-task training and demonstrate that our model can also be applied for
traditional VTON and pose transfer tasks and achieve comparable performance to
SOTA specialized models on both tasks
Virtual Try-On With Generative Adversarial Networks: A Taxonomical Survey
This chapter elaborates on using generative adversarial networks (GAN) for virtual try-on applications. It presents the first comprehensive survey on this topic. Virtual try-on represents a practical application of GANs and pixel translation, which improves on the techniques of virtual try-on prior to these new discoveries. This survey details the importance of virtual try-on systems and the history of virtual try-on; shows how GANs, pixel translation, and perceptual losses have influenced the field; and summarizes the latest research in creating virtual try-on systems. Additionally, the authors present the future directions of research to improve virtual try-on systems by making them usable, faster, more effective. By walking through the steps of virtual try-on from start to finish, the chapter aims to expose readers to key concepts shared by many GAN applications and to give readers a solid foundation to pursue further topics in GANs
Dual-Branch Collaborative Transformer for Virtual Try-On
Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to its challenging setting and practical real-world applications. While pure convolutional approaches have been explored to solve the task, Transformer-based architectures have not received significant attention yet. Following the intuition that self- and cross-attention operators can deal with long-range dependencies and hence improve the generation, in this paper we extend a Transformer-based virtual try-on model by adding a dual-branch collaborative module that can exploit cross-modal information at generation time. We perform experiments on the VITON dataset, which is the standard benchmark for the task, and on a recently collected virtual try-on dataset with multi-category clothing, Dress Code. Experimental results demonstrate the effectiveness of our solution over previous methods and show that Transformer-based architectures can be a viable alternative for virtual try-on
Dress Code: High-Resolution Multi-Category Virtual Try-On
Image-based virtual try-on strives to transfer the appearance of a clothing
item onto the image of a target person. Prior work focuses mainly on upper-body
clothes (e.g. t-shirts, shirts, and tops) and neglects full-body or lower-body
items. This shortcoming arises from a main factor: current publicly available
datasets for image-based virtual try-on do not account for this variety, thus
limiting progress in the field. To address this deficiency, we introduce Dress
Code, which contains images of multi-category clothes. Dress Code is more than
3x larger than publicly available datasets for image-based virtual try-on and
features high-resolution paired images (1024 x 768) with front-view, full-body
reference models. To generate HD try-on images with high visual quality and
rich in details, we propose to learn fine-grained discriminating features.
Specifically, we leverage a semantic-aware discriminator that makes predictions
at pixel-level instead of image- or patch-level. Extensive experimental
evaluation demonstrates that the proposed approach surpasses the baselines and
state-of-the-art competitors in terms of visual quality and quantitative
results. The Dress Code dataset is publicly available at
https://github.com/aimagelab/dress-code.Comment: Dress Code - Video Demo: https://www.youtube.com/watch?v=qr6TW3uTHG
- …