5 research outputs found

    Dual Pyramid Generative Adversarial Networks for Semantic Image Synthesis

    Full text link
    The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps. It is highly relevant for tasks like content generation and image editing. Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales. In particular, small objects tend to fade away and large objects are often generated as collages of patches. In order to address this issue, we propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly, such that scale information is bi-directionally used, and it unifies supervision at different scales. Our qualitative and quantitative results show that the proposed approach generates images where small and large objects look more realistic compared to images generated by state-of-the-art methods.Comment: BMVC202

    NeuralFloors: conditional street-level scene generation from BEV semantic maps via neural fields

    Get PDF
    Semantic Bird's Eye View (BEV) representations are a popular format, being easily interpretable and editable. However, synthesising ground-view images from BEVs is a difficult task as the system would need to learn both the mapping from BEV to Front View (FV) structure as well as to synthesise highly photo-realistic imagery, thus having to simultaneously consider both the geometry and appearance of the scene. We therefore present a factorised approach that tackles the problem in two stages: a first stage that learns a BEV to FV transformation in the semantic space through a Neural Field, and a second stage that leverages a Latent Diffusion Model (LDM) to synthesise images conditional on the output of the first stage. Our experiments show that this approach produces RGB images with a high perceptual quality that are also well aligned with their corresponding FV ground-truth

    Improving quality and controllability in GAN-based image synthesis

    Get PDF
    The goal of the field of deep learning-based image generation is to synthesize images that are indistinguishable from real ones, and to precisely control the content of these images. Generative adversarial networks (GANs) have been the most popular image synthesis framework in recent years due to their unrivaled image quality. They consist of a generator and discriminator network, where the discriminator is trained to detect synthetic images, while the generator is trained to outsmart the discriminator by synthesizing more realistic images. Much progress has been made in the development of GANs, but there is still a lot of work to be done to further improve the synthesis quality and control. To this end, this work proposes methods to improve the synthesis quality of GANs and increase the control over the image content. First, we propose the idea of segmentation-based adversarial losses to increase the quality of synthetic images. In particular, we redesign the GAN discriminator as a segmentation network that classifies image pixels as real or fake. Further, we propose a regularization made possible by the new discriminator design. The new method improves image quality in unconditional and conditional GANs. Second, we show that segmentation-based adversarial losses are naturally well-suited for semantic image synthesis. Semantic image synthesis is the task of generating images from semantic layouts, which offers precise control over the content. We adapt the approach of a segmentation-based GAN loss to semantic image synthesis and thereby make previously used extra supervision superfluous. In addition, we introduce a noise injection method to increase the synthesis diversity significantly. The effects of the proposed techniques are improved image quality, new possibilities for global and local image editing, better modeling of long-tailed data, the ability to generate images from sparsely-annotated label maps, and a substantial increase in the multi-modality of the synthesized images. In doing so, our model is also conceptually simpler and more parameter-efficient than previous models. Third, we show that our improvement in multi-modality in semantic image synthesis opens the door for controlling the image content via the latent space of the GAN generator. Therefore, we are the first to introduce a method for finding interpretable directions in the latent space of semantic image synthesis GANs. Consequently, we enable additional control of the image content via discovered latent controls, next to the semantic layouts. In summary, this work advances the state of the art in image synthesis for several types of GANs, including GANs for semantic image synthesis. We also enable a new form of control over the image content for the latter.Das Ziel der Deep Learning basierenden Bildgenerierung ist es, Bilder zu synthetisieren, die nicht von echten Bildern zu unterscheiden sind und deren Inhalt genau zu steuern. Generative Adversarial Networks (GANs) waren in den letzten Jahren aufgrund ihrer hohen BildqualitĂ€t das beliebteste Framework fĂŒr die Bildsynthese. GANs setzen sich aus einem Generator- und Diskriminatornetzwerk zusammen, wobei der Diskriminator darauf trainiert wird, synthetische Bilder zu erkennen, wĂ€hrend der Generator darauf trainiert wird den Diskriminator zu ĂŒberlisten indem er realistischere Bilder synthetisiert. Trotz großer Fortschritte in den letzten Jahren ist noch viel Arbeit nötig um die QualitĂ€t der Bildsynthese sowie die Kontrolle ĂŒber den Bildinhalt zu verbessern. Zu diesem Zweck prĂ€sentiert diese Arbeit neue Methoden, welche die QualitĂ€t und die Kontrolle ĂŒber den Inhalt von GAN-generierten Bildern verbessern. ZunĂ€chst schlagen wir vor segmentierungsbasierte Zielfunktionen fĂŒr GANs zu benutzen um die QualitĂ€t synthetischer Bilder zu verbessern. Zu diesem Zweck gestalten wir den GAN-Diskriminator als Segmentierungsnetzwerk neu das Pixel als echt oder gefĂ€lscht klassifiziert. Weiterhin schlagen wir eine Regularisierung vor die durch das neue Diskriminatordesign ermöglicht wird. Unser Verfahren verbessert die BildqualitĂ€t in Klassen-konditionierten und unkonditioniert GANs. Zweitens zeigen wir, dass segmentierungsbasierte Zielfunktionen sehr gut fĂŒr die Semantische Bildsynthese geeignet sind, welche Bilder aus semantischen Karten generiert. Wir wenden eine segmentierungsbasierten GAN-Zielfunktion fĂŒr die semantische Bildsynthese an und machen dadurch die bisher verwendete zusĂ€tzliche Überwachung ĂŒberflĂŒssig. DarĂŒber hinaus fĂŒhren wir eine Rauschinjektionsmethode ein welche die Synthesevielfalt erheblich erhöht. Unsere vorgeschlagenen Techniken ermöglichen eine verbesserte BildqualitĂ€t, globale und lokalen Bildmanipulation, eine bessere Modellierung von Long-Tail-Daten, die FĂ€higkeit, Bilder von spĂ€rlich annotierten semantischen Karten zu generieren, und eine wesentliche Steigerung der MultimodalitĂ€t der synthetisierten Bilder. Dabei ist unser Modell auch konzeptionell einfacher und parametereffizienter als bisherige Modelle. Drittens zeigen wir, dass unsere Verbesserung der MultimodalitĂ€t in der semantischen Bildsynthese die Steuerung des Bildinhalts ĂŒber die latente ReprĂ€sentation des GAN-Generators ermöglicht. Daher stellen wir als erste eine Methode vor, um interpretierbare Richtungen im latenten Raum von GANs zur Semantischer Bildsynthese zu finden. Folglich ermöglichen wir neben den semantischen Karten eine zusĂ€tzliche Kontrolle des Bildinhalts ĂŒber entdeckte latente Steuerungen. Zusammenfassend lĂ€sst sich sagen, dass diese Arbeit den Stand der Technik in der Bildsynthese fĂŒr mehrere Arten von GANs voran bringt, einschließlich GANs fĂŒr die semantische Bildsynthese. Letzteren ermöglichen wir auch eine neue Form der Kontrolle ĂŒber den Bildinhalt
    corecore