1,010 research outputs found
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes
Deep generative models have been used in recent years to learn coherent latent representations in order to synthesize high-quality images. In this work, we propose a neural network to learn a generative model for sampling consistent indoor scene layouts. Our method learns the co-occurrences, and appearance parameters such as shape and pose, for different objects categories through a grammar-based auto-encoder, resulting in a compact and accurate representation for scene layouts. In contrast to existing grammar-based methods with a user-specified grammar, we construct the grammar automatically by extracting a set of production rules on reasoning about object co-occurrences in training data. The extracted grammar is able to represent a scene by an augmented parse tree. The proposed auto-encoder encodes these parse trees to a latent code, and decodes the latent code to a parse tree, thereby ensuring the generated scene is always valid. We experimentally demonstrate that the proposed auto-encoder learns not only to generate valid scenes (i.e. the arrangements and appearances of objects), but it also learns coherent latent representations where nearby latent samples decode to similar scene outputs. The obtained generative model is applicable to several computer vision tasks such as 3D pose and layout estimation from RGB-D data
- …