343 research outputs found
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
In this work, we propose TediGAN, a novel framework for multi-modal image
generation and manipulation with textual descriptions. The proposed method
consists of three components: StyleGAN inversion module, visual-linguistic
similarity learning, and instance-level optimization. The inversion module maps
real images to the latent space of a well-trained StyleGAN. The
visual-linguistic similarity learns the text-image matching by mapping the
image and text into a common embedding space. The instance-level optimization
is for identity preservation in manipulation. Our model can produce diverse and
high-quality images with an unprecedented resolution at 1024. Using a control
mechanism based on style-mixing, our TediGAN inherently supports image
synthesis with multi-modal inputs, such as sketches or semantic labels, with or
without instance guidance. To facilitate text-guided multi-modal synthesis, we
propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real
face images and corresponding semantic segmentation map, sketch, and textual
descriptions. Extensive experiments on the introduced dataset demonstrate the
superior performance of our proposed method. Code and data are available at
https://github.com/weihaox/TediGAN.Comment: CVPR 2021. Code: https://github.com/weihaox/TediGAN Data:
https://github.com/weihaox/Multi-Modal-CelebA-HQ Video:
https://youtu.be/L8Na2f5viA
TextureGAN: Controlling Deep Image Synthesis with Texture Patches
In this paper, we investigate deep image synthesis guided by sketch, color,
and texture. Previous image synthesis methods can be controlled by sketch and
color strokes but we are the first to examine texture control. We allow a user
to place a texture patch on a sketch at arbitrary locations and scales to
control the desired output texture. Our generative network learns to synthesize
objects consistent with these texture suggestions. To achieve this, we develop
a local texture loss in addition to adversarial and content loss to train the
generative network. We conduct experiments using sketches generated from real
images and textures sampled from a separate texture database and results show
that our proposed algorithm is able to generate plausible images that are
faithful to user controls. Ablation studies show that our proposed pipeline can
generate more realistic images than adapting existing methods directly.Comment: CVPR 2018 spotligh
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training
Exemplar-based sketch-to-photo synthesis allows users to generate
photo-realistic images based on sketches. Recently, diffusion-based methods
have achieved impressive performance on image generation tasks, enabling
highly-flexible control through text-driven generation or energy functions.
However, generating photo-realistic images with color and texture from sketch
images remains challenging for diffusion models. Sketches typically consist of
only a few strokes, with most regions left blank, making it difficult for
diffusion-based methods to produce photo-realistic images. In this work, we
propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based
sketch-to-photo synthesis. This approach includes shape-enhancing inversion and
full-control inversion. During the shape-enhancing inversion process, an
uncolored photo is generated with the guidance of a shape-energy function. This
step is essential to ensure control over the shape of the generated photo. In
the full-control inversion process, we propose an appearance-energy function to
control the color and texture of the final generated photo.Importantly, our
Inversion-by-Inversion pipeline is training-free and can accept different types
of exemplars for color and texture control. We conducted extensive experiments
to evaluate our proposed method, and the results demonstrate its effectiveness.Comment: 15 pages, preprint versio
GAN Inversion: A Survey
GAN inversion aims to invert a given image back into the latent space of a
pretrained GAN model, for the image to be faithfully reconstructed from the
inverted code by the generator. As an emerging technique to bridge the real and
fake image domains, GAN inversion plays an essential role in enabling the
pretrained GAN models such as StyleGAN and BigGAN to be used for real image
editing applications. Meanwhile, GAN inversion also provides insights on the
interpretation of GAN's latent space and how the realistic images can be
generated. In this paper, we provide an overview of GAN inversion with a focus
on its recent algorithms and applications. We cover important techniques of GAN
inversion and their applications to image restoration and image manipulation.
We further elaborate on some trends and challenges for future directions
Domain Re-Modulation for Few-Shot Generative Domain Adaptation
In this study, we delve into the task of few-shot Generative Domain
Adaptation (GDA), which involves transferring a pre-trained generator from one
domain to a new domain using only a few reference images. Inspired by the way
human brains acquire knowledge in new domains, we present an innovative
generator structure called Domain Re-Modulation (DoRM). DoRM not only meets the
criteria of high quality, large synthesis diversity, and cross-domain
consistency, which were achieved by previous research in GDA, but also
incorporates memory and domain association, akin to how human brains operate.
Specifically, DoRM freezes the source generator and introduces new mapping and
affine modules (M&A modules) to capture the attributes of the target domain
during GDA. This process resembles the formation of new synapses in human
brains. Consequently, a linearly combinable domain shift occurs in the style
space. By incorporating multiple new M&A modules, the generator gains the
capability to perform high-fidelity multi-domain and hybrid-domain generation.
Moreover, to maintain cross-domain consistency more effectively, we introduce a
similarity-based structure loss. This loss aligns the auto-correlation map of
the target image with its corresponding auto-correlation map of the source
image during training. Through extensive experiments, we demonstrate the
superior performance of our DoRM and similarity-based structure loss in
few-shot GDA, both quantitatively and qualitatively. The code will be available
at https://github.com/wuyi2020/DoRM.Comment: Under Revie
Enhancing the Authenticity of Rendered Portraits with Identity-Consistent Transfer Learning
Despite rapid advances in computer graphics, creating high-quality
photo-realistic virtual portraits is prohibitively expensive. Furthermore, the
well-know ''uncanny valley'' effect in rendered portraits has a significant
impact on the user experience, especially when the depiction closely resembles
a human likeness, where any minor artifacts can evoke feelings of eeriness and
repulsiveness. In this paper, we present a novel photo-realistic portrait
generation framework that can effectively mitigate the ''uncanny valley''
effect and improve the overall authenticity of rendered portraits. Our key idea
is to employ transfer learning to learn an identity-consistent mapping from the
latent space of rendered portraits to that of real portraits. During the
inference stage, the input portrait of an avatar can be directly transferred to
a realistic portrait by changing its appearance style while maintaining the
facial identity. To this end, we collect a new dataset, Daz-Rendered-Faces-HQ
(DRFHQ), that is specifically designed for rendering-style portraits. We
leverage this dataset to fine-tune the StyleGAN2 generator, using our carefully
crafted framework, which helps to preserve the geometric and color features
relevant to facial identity. We evaluate our framework using portraits with
diverse gender, age, and race variations. Qualitative and quantitative
evaluations and ablation studies show the advantages of our method compared to
state-of-the-art approaches.Comment: 10 pages, 8 figures, 2 table
- …