15 research outputs found
Unsupervised Diverse Colorization via Generative Adversarial Networks
Colorization of grayscale images has been a hot topic in computer vision.
Previous research mainly focuses on producing a colored image to match the
original one. However, since many colors share the same gray value, an input
grayscale image could be diversely colored while maintaining its reality. In
this paper, we design a novel solution for unsupervised diverse colorization.
Specifically, we leverage conditional generative adversarial networks to model
the distribution of real-world item colors, in which we develop a fully
convolutional generator with multi-layer noise to enhance diversity, with
multi-layer condition concatenation to maintain reality, and with stride 1 to
keep spatial information. With such a novel network architecture, the model
yields highly competitive performance on the open LSUN bedroom dataset. The
Turing test of 80 humans further indicates our generated color schemes are
highly convincible
DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
Image-based fashion design with AI techniques has attracted increasing
attention in recent years. We focus on a new fashion design task, where we aim
to transfer a reference appearance image onto a clothing image while preserving
the structure of the clothing image. It is a challenging task since there are
no reference images available for the newly designed output fashion images.
Although diffusion-based image translation or neural style transfer (NST) has
enabled flexible style transfer, it is often difficult to maintain the original
structure of the image realistically during the reverse diffusion, especially
when the referenced appearance image greatly differs from the common clothing
appearance. To tackle this issue, we present a novel diffusion model-based
unsupervised structure-aware transfer method to semantically generate new
clothes from a given clothing image and a reference appearance image. In
specific, we decouple the foreground clothing with automatically generated
semantic masks by conditioned labels. And the mask is further used as guidance
in the denoising process to preserve the structure information. Moreover, we
use the pre-trained vision Transformer (ViT) for both appearance and structure
guidance. Our experimental results show that the proposed method outperforms
state-of-the-art baseline models, generating more realistic images in the
fashion design task. Code and demo can be found at
https://github.com/Rem105-210/DiffFashion