Search CORE

5 research outputs found

Dual Pyramid Generative Adversarial Networks for Semantic Image Synthesis

Author: Cheng Ming-Ming
Gall Juergen
Li Shijie
Publication venue
Publication date: 08/10/2022
Field of study

The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps. It is highly relevant for tasks like content generation and image editing. Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales. In particular, small objects tend to fade away and large objects are often generated as collages of patches. In order to address this issue, we propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly, such that scale information is bi-directionally used, and it unifies supervision at different scales. Our qualitative and quantitative results show that the proposed approach generates images where small and large objects look more realistic compared to images generated by state-of-the-art methods.Comment: BMVC202

arXiv.org e-Print Archive

NeuralFloors: conditional street-level scene generation from BEV semantic maps via neural fields

Author: De Martini Daniele
Gadd Matthew
Muat Valentina
Newman Paul
Publication venue: IEEE
Publication date: 22/01/2024
Field of study

Semantic Bird's Eye View (BEV) representations are a popular format, being easily interpretable and editable. However, synthesising ground-view images from BEVs is a difficult task as the system would need to learn both the mapping from BEV to Front View (FV) structure as well as to synthesise highly photo-realistic imagery, thus having to simultaneously consider both the geometry and appearance of the scene. We therefore present a factorised approach that tackles the problem in two stages: a first stage that learns a BEV to FV transformation in the semantic space through a Neural Field, and a second stage that leverages a Latent Diffusion Model (LDM) to synthesise images conditional on the output of the first stage. Our experiments show that this approach produces RGB images with a high perceptual quality that are also well aligned with their corresponding FV ground-truth

Oxford University Research Archive

You Only Need Adversarial Supervision for Semantic Image Synthesis

Author: Gall J.
Khoreva A.
Schiele B.
Schönfeld E.
Sushko V.
Zhang D.
Publication venue
Publication date: 01/01/2021
Field of study

MPG.PuRe

Improving quality and controllability in GAN-based image synthesis

Author: Schönfeld Edgar
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

The goal of the field of deep learning-based image generation is to synthesize images that are indistinguishable from real ones, and to precisely control the content of these images. Generative adversarial networks (GANs) have been the most popular image synthesis framework in recent years due to their unrivaled image quality. They consist of a generator and discriminator network, where the discriminator is trained to detect synthetic images, while the generator is trained to outsmart the discriminator by synthesizing more realistic images. Much progress has been made in the development of GANs, but there is still a lot of work to be done to further improve the synthesis quality and control. To this end, this work proposes methods to improve the synthesis quality of GANs and increase the control over the image content. First, we propose the idea of segmentation-based adversarial losses to increase the quality of synthetic images. In particular, we redesign the GAN discriminator as a segmentation network that classifies image pixels as real or fake. Further, we propose a regularization made possible by the new discriminator design. The new method improves image quality in unconditional and conditional GANs. Second, we show that segmentation-based adversarial losses are naturally well-suited for semantic image synthesis. Semantic image synthesis is the task of generating images from semantic layouts, which offers precise control over the content. We adapt the approach of a segmentation-based GAN loss to semantic image synthesis and thereby make previously used extra supervision superfluous. In addition, we introduce a noise injection method to increase the synthesis diversity significantly. The effects of the proposed techniques are improved image quality, new possibilities for global and local image editing, better modeling of long-tailed data, the ability to generate images from sparsely-annotated label maps, and a substantial increase in the multi-modality of the synthesized images. In doing so, our model is also conceptually simpler and more parameter-efficient than previous models. Third, we show that our improvement in multi-modality in semantic image synthesis opens the door for controlling the image content via the latent space of the GAN generator. Therefore, we are the first to introduce a method for finding interpretable directions in the latent space of semantic image synthesis GANs. Consequently, we enable additional control of the image content via discovered latent controls, next to the semantic layouts. In summary, this work advances the state of the art in image synthesis for several types of GANs, including GANs for semantic image synthesis. We also enable a new form of control over the image content for the latter.Das Ziel der Deep Learning basierenden Bildgenerierung ist es, Bilder zu synthetisieren, die nicht von echten Bildern zu unterscheiden sind und deren Inhalt genau zu steuern. Generative Adversarial Networks (GANs) waren in den letzten Jahren aufgrund ihrer hohen Bildqualität das beliebteste Framework für die Bildsynthese. GANs setzen sich aus einem Generator- und Diskriminatornetzwerk zusammen, wobei der Diskriminator darauf trainiert wird, synthetische Bilder zu erkennen, während der Generator darauf trainiert wird den Diskriminator zu überlisten indem er realistischere Bilder synthetisiert. Trotz großer Fortschritte in den letzten Jahren ist noch viel Arbeit nötig um die Qualität der Bildsynthese sowie die Kontrolle über den Bildinhalt zu verbessern. Zu diesem Zweck präsentiert diese Arbeit neue Methoden, welche die Qualität und die Kontrolle über den Inhalt von GAN-generierten Bildern verbessern. Zunächst schlagen wir vor segmentierungsbasierte Zielfunktionen für GANs zu benutzen um die Qualität synthetischer Bilder zu verbessern. Zu diesem Zweck gestalten wir den GAN-Diskriminator als Segmentierungsnetzwerk neu das Pixel als echt oder gefälscht klassifiziert. Weiterhin schlagen wir eine Regularisierung vor die durch das neue Diskriminatordesign ermöglicht wird. Unser Verfahren verbessert die Bildqualität in Klassen-konditionierten und unkonditioniert GANs. Zweitens zeigen wir, dass segmentierungsbasierte Zielfunktionen sehr gut für die Semantische Bildsynthese geeignet sind, welche Bilder aus semantischen Karten generiert. Wir wenden eine segmentierungsbasierten GAN-Zielfunktion für die semantische Bildsynthese an und machen dadurch die bisher verwendete zusätzliche Überwachung überflüssig. Darüber hinaus führen wir eine Rauschinjektionsmethode ein welche die Synthesevielfalt erheblich erhöht. Unsere vorgeschlagenen Techniken ermöglichen eine verbesserte Bildqualität, globale und lokalen Bildmanipulation, eine bessere Modellierung von Long-Tail-Daten, die Fähigkeit, Bilder von spärlich annotierten semantischen Karten zu generieren, und eine wesentliche Steigerung der Multimodalität der synthetisierten Bilder. Dabei ist unser Modell auch konzeptionell einfacher und parametereffizienter als bisherige Modelle. Drittens zeigen wir, dass unsere Verbesserung der Multimodalität in der semantischen Bildsynthese die Steuerung des Bildinhalts über die latente Repräsentation des GAN-Generators ermöglicht. Daher stellen wir als erste eine Methode vor, um interpretierbare Richtungen im latenten Raum von GANs zur Semantischer Bildsynthese zu finden. Folglich ermöglichen wir neben den semantischen Karten eine zusätzliche Kontrolle des Bildinhalts über entdeckte latente Steuerungen. Zusammenfassend lässt sich sagen, dass diese Arbeit den Stand der Technik in der Bildsynthese für mehrere Arten von GANs voran bringt, einschließlich GANs für die semantische Bildsynthese. Letzteren ermöglichen wir auch eine neue Form der Kontrolle über den Bildinhalt

Universaar

Acronym