4 research outputs found
Hierarchical Modes Exploring in Generative Adversarial Networks
In conditional Generative Adversarial Networks (cGANs), when two different
initial noises are concatenated with the same conditional information, the
distance between their outputs is relatively smaller, which makes minor modes
likely to collapse into large modes. To prevent this happen, we proposed a
hierarchical mode exploring method to alleviate mode collapse in cGANs by
introducing a diversity measurement into the objective function as the
regularization term. We also introduced the Expected Ratios of Expansion (ERE)
into the regularization term, by minimizing the sum of differences between the
real change of distance and ERE, we can control the diversity of generated
images w.r.t specific-level features. We validated the proposed algorithm on
four conditional image synthesis tasks including categorical generation, paired
and un-paired image translation and text-to-image generation. Both qualitative
and quantitative results show that the proposed method is effective in
alleviating the mode collapse problem in cGANs, and can control the diversity
of output images w.r.t specific-level features
A U-Net Based Discriminator for Generative Adversarial Networks
Among the major remaining challenges for generative adversarial networks
(GANs) is the capacity to synthesize globally and locally coherent images with
object shapes and textures indistinguishable from real images. To target this
issue we propose an alternative U-Net based discriminator architecture,
borrowing the insights from the segmentation literature. The proposed U-Net
based architecture allows to provide detailed per-pixel feedback to the
generator while maintaining the global coherence of synthesized images, by
providing the global image feedback as well. Empowered by the per-pixel
response of the discriminator, we further propose a per-pixel consistency
regularization technique based on the CutMix data augmentation, encouraging the
U-Net discriminator to focus more on semantic and structural changes between
real and fake images. This improves the U-Net discriminator training, further
enhancing the quality of generated samples. The novel discriminator improves
over the state of the art in terms of the standard distribution and image
quality metrics, enabling the generator to synthesize images with varying
structure, appearance and levels of detail, maintaining global and local
realism. Compared to the BigGAN baseline, we achieve an average improvement of
2.7 FID points across FFHQ, CelebA, and the newly introduced COCO-Animals
dataset. The code is available at https://github.com/boschresearch/unetgan.Comment: CVPR 2020 (Main Conference). Code repository:
https://github.com/boschresearch/unetga