466 research outputs found
XingGAN for Person Image Generation
We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN)
for person image generation tasks, i.e., translating the pose of a given person
to a desired one. The proposed Xing generator consists of two generation
branches that model the person's appearance and shape information,
respectively. Moreover, we propose two novel blocks to effectively transfer and
update the person's shape and appearance embeddings in a crossing way to
mutually improve each other, which has not been considered by any other
existing GAN-based image generation work. Extensive experiments on two
challenging datasets, i.e., Market-1501 and DeepFashion, demonstrate that the
proposed XingGAN advances the state-of-the-art performance both in terms of
objective quantitative scores and subjective visual realness. The source code
and trained models are available at https://github.com/Ha0Tang/XingGAN.Comment: Accepted to ECCV 2020, camera ready (16 pages) + supplementary (6
pages
Dual Attention GANs for Semantic Image Synthesis
In this paper, we focus on the semantic image synthesis task that aims at
transferring semantic label maps to photo-realistic images. Existing methods
lack effective semantic constraints to preserve the semantic information and
ignore the structural correlations in both spatial and channel dimensions,
leading to unsatisfactory blurry and artifact-prone results. To address these
limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize
photo-realistic and semantically-consistent images with fine details from the
input layouts without imposing extra training overhead or modifying the network
architectures of existing methods. We also propose two novel modules, i.e.,
position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention
Module (CAM), to capture semantic structure attention in spatial and channel
dimensions, respectively. Specifically, SAM selectively correlates the pixels
at each position by a spatial attention map, leading to pixels with the same
semantic label being related to each other regardless of their spatial
distances. Meanwhile, CAM selectively emphasizes the scale-wise features at
each channel by a channel attention map, which integrates associated features
among all channel maps regardless of their scales. We finally sum the outputs
of SAM and CAM to further improve feature representation. Extensive experiments
on four challenging datasets show that DAGAN achieves remarkably better results
than state-of-the-art methods, while using fewer model parameters. The source
code and trained models are available at https://github.com/Ha0Tang/DAGAN.Comment: Accepted to ACM MM 2020, camera ready (9 pages) + supplementary (10
pages
Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation
We propose a novel model named Multi-Channel Attention Selection Generative
Adversarial Network (SelectionGAN) for guided image-to-image translation, where
we translate an input image into another while respecting an external semantic
guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance
information and consists of two stages. In the first stage, the input image and
the conditional semantic guidance are fed into a cycled semantic-guided
generation network to produce initial coarse results. In the second stage, we
refine the initial results by using the proposed multi-scale spatial pooling \&
channel selection module and the multi-channel attention selection module.
Moreover, uncertainty maps automatically learned from attention maps are used
to guide the pixel loss for better network optimization. Exhaustive experiments
on four challenging guided image-to-image translation tasks (face, hand, body
and street view) demonstrate that our SelectionGAN is able to generate
significantly better results than the state-of-the-art methods. Meanwhile, the
proposed framework and modules are unified solutions and can be applied to
solve other generation tasks, such as semantic image synthesis. The code is
available at https://github.com/Ha0Tang/SelectionGAN.Comment: An extended version of a paper published in CVPR2019. arXiv admin
note: substantial text overlap with arXiv:1904.0680
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis
We propose a novel ECGAN for the challenging semantic image synthesis task.
Although considerable improvements have been achieved by the community in the
recent period, the quality of synthesized images is far from satisfactory due
to three largely unresolved challenges. 1) The semantic labels do not provide
detailed structural information, making it challenging to synthesize local
details and structures; 2) The widely adopted CNN operations such as
convolution, down-sampling, and normalization usually cause spatial resolution
loss and thus cannot fully preserve the original semantic information, leading
to semantically inconsistent results (e.g., missing small objects); 3) Existing
semantic image synthesis methods focus on modeling 'local' semantic information
from a single input semantic layout. However, they ignore 'global' semantic
information of multiple input semantic layouts, i.e., semantic cross-relations
between pixels across different input layouts. To tackle 1), we propose to use
the edge as an intermediate representation which is further adopted to guide
image generation via a proposed attention guided edge transfer module. To
tackle 2), we design an effective module to selectively highlight
class-dependent feature maps according to the original semantic layout to
preserve the semantic information. To tackle 3), inspired by current methods in
contrastive learning, we propose a novel contrastive learning method, which
aims to enforce pixel embeddings belonging to the same semantic class to
generate more similar image content than those from different classes. We
further propose a novel multi-scale contrastive learning method that aims to
push same-class features from different scales closer together being able to
capture more semantic relations by explicitly exploring the structures of
labeled pixels from multiple input semantic layouts from different scales.Comment: Accepted to TPAMI, an extended version of a paper published in
ICLR2023. arXiv admin note: substantial text overlap with arXiv:2003.1389
Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
In this paper, we address the task of semantic-guided scene generation. One
open challenge in scene generation is the difficulty of the generation of small
objects and detailed local texture, which has been widely observed in global
image-level generation methods. To tackle this issue, in this work we consider
learning the scene generation in a local context, and correspondingly design a
local class-specific generative network with semantic maps as a guidance, which
separately constructs and learns sub-generators concentrating on the generation
of different classes, and is able to provide more scene details. To learn more
discriminative class-specific feature representations for the local generation,
a novel classification module is also proposed. To combine the advantage of
both the global image-level and the local class-specific generation, a joint
generation network is designed with an attention fusion module and a
dual-discriminator structure embedded. Extensive experiments on two scene image
generation tasks show superior generation performance of the proposed model.
The state-of-the-art results are established by large margins on both tasks and
on challenging public benchmarks. The source code and trained models are
available at https://github.com/Ha0Tang/LGGAN.Comment: Accepted to CVPR 2020, camera ready (10 pages) + supplementary (18
pages
- …