1,667 research outputs found
Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis
We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for
photo-realistic image synthesis from semantic layouts. Although considerable
improvement has been achieved, the quality of synthesized images is far from
satisfactory due to two largely unresolved challenges. First, the semantic
labels do not provide detailed structural information, making it difficult to
synthesize local details and structures. Second, the widely adopted CNN
operations such as convolution, down-sampling and normalization usually cause
spatial resolution loss and thus are unable to fully preserve the original
semantic information, leading to semantically inconsistent results (e.g.,
missing small objects). To tackle the first challenge, we propose to use the
edge as an intermediate representation which is further adopted to guide image
generation via a proposed attention guided edge transfer module. Edge
information is produced by a convolutional generator and introduces detailed
structure information. Further, to preserve the semantic information, we design
an effective module to selectively highlight class-dependent feature maps
according to the original semantic layout. Extensive experiments on two
challenging datasets show that the proposed EdgeGAN can generate significantly
better results than state-of-the-art methods. The source code and trained
models are available at https://github.com/Ha0Tang/EdgeGAN.Comment: 40 pages, 29 figure
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis
We propose a novel ECGAN for the challenging semantic image synthesis task.
Although considerable improvements have been achieved by the community in the
recent period, the quality of synthesized images is far from satisfactory due
to three largely unresolved challenges. 1) The semantic labels do not provide
detailed structural information, making it challenging to synthesize local
details and structures; 2) The widely adopted CNN operations such as
convolution, down-sampling, and normalization usually cause spatial resolution
loss and thus cannot fully preserve the original semantic information, leading
to semantically inconsistent results (e.g., missing small objects); 3) Existing
semantic image synthesis methods focus on modeling 'local' semantic information
from a single input semantic layout. However, they ignore 'global' semantic
information of multiple input semantic layouts, i.e., semantic cross-relations
between pixels across different input layouts. To tackle 1), we propose to use
the edge as an intermediate representation which is further adopted to guide
image generation via a proposed attention guided edge transfer module. To
tackle 2), we design an effective module to selectively highlight
class-dependent feature maps according to the original semantic layout to
preserve the semantic information. To tackle 3), inspired by current methods in
contrastive learning, we propose a novel contrastive learning method, which
aims to enforce pixel embeddings belonging to the same semantic class to
generate more similar image content than those from different classes. We
further propose a novel multi-scale contrastive learning method that aims to
push same-class features from different scales closer together being able to
capture more semantic relations by explicitly exploring the structures of
labeled pixels from multiple input semantic layouts from different scales.Comment: Accepted to TPAMI, an extended version of a paper published in
ICLR2023. arXiv admin note: substantial text overlap with arXiv:2003.1389
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
Hierarchy Composition GAN for High-fidelity Image Synthesis
Despite the rapid progress of generative adversarial networks (GANs) in image
synthesis in recent years, the existing image synthesis approaches work in
either geometry domain or appearance domain alone which often introduces
various synthesis artifacts. This paper presents an innovative Hierarchical
Composition GAN (HIC-GAN) that incorporates image synthesis in geometry and
appearance domains into an end-to-end trainable network and achieves superior
synthesis realism in both domains simultaneously. We design an innovative
hierarchical composition mechanism that is capable of learning realistic
composition geometry and handling occlusions while multiple foreground objects
are involved in image composition. In addition, we introduce a novel attention
mask mechanism that guides to adapt the appearance of foreground objects which
also helps to provide better training reference for learning in geometry
domain. Extensive experiments on scene text image synthesis, portrait editing
and indoor rendering tasks show that the proposed HIC-GAN achieves superior
synthesis performance qualitatively and quantitatively.Comment: 11 pages, 8 figure
Learning Compositional Visual Concepts with Mutual Consistency
Compositionality of semantic concepts in image synthesis and analysis is
appealing as it can help in decomposing known and generatively recomposing
unknown data. For instance, we may learn concepts of changing illumination,
geometry or albedo of a scene, and try to recombine them to generate physically
meaningful, but unseen data for training and testing. In practice however we
often do not have samples from the joint concept space available: We may have
data on illumination change in one data set and on geometric change in another
one without complete overlap. We pose the following question: How can we learn
two or more concepts jointly from different data sets with mutual consistency
where we do not have samples from the full joint space? We present a novel
answer in this paper based on cyclic consistency over multiple concepts,
represented individually by generative adversarial networks (GANs). Our method,
ConceptGAN, can be understood as a drop in for data augmentation to improve
resilience for real world applications. Qualitative and quantitative
evaluations demonstrate its efficacy in generating semantically meaningful
images, as well as one shot face verification as an example application.Comment: 10 pages, 8 figures, 4 tables, CVPR 201
- …