13,145 research outputs found
Semantic Image Synthesis via Adversarial Learning
In this paper, we propose a way of synthesizing realistic images directly
with natural language description, which has many useful applications, e.g.
intelligent image manipulation. We attempt to accomplish such synthesis: given
a source image and a target text description, our model synthesizes images to
meet two requirements: 1) being realistic while matching the target text
description; 2) maintaining other image features that are irrelevant to the
text description. The model should be able to disentangle the semantic
information from the two modalities (image and text), and generate new images
from the combined semantics. To achieve this, we proposed an end-to-end neural
architecture that leverages adversarial learning to automatically learn
implicit loss functions, which are optimized to fulfill the aforementioned two
requirements. We have evaluated our model by conducting experiments on
Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated
that our model is capable of synthesizing realistic images that match the given
descriptions, while still maintain other features of original images.Comment: Accepted to ICCV 201
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
Discriminative Region Proposal Adversarial Networks for High-Quality Image-to-Image Translation
Image-to-image translation has been made much progress with embracing
Generative Adversarial Networks (GANs). However, it's still very challenging
for translation tasks that require high quality, especially at high-resolution
and photorealism. In this paper, we present Discriminative Region Proposal
Adversarial Networks (DRPAN) for high-quality image-to-image translation. We
decompose the procedure of image-to-image translation task into three iterated
steps, first is to generate an image with global structure but some local
artifacts (via GAN), second is using our DRPnet to propose the most fake region
from the generated image, and third is to implement "image inpainting" on the
most fake region for more realistic result through a reviser, so that the
system (DRPAN) can be gradually optimized to synthesize images with more
attention on the most artifact local part. Experiments on a variety of
image-to-image translation tasks and datasets validate that our method
outperforms state-of-the-arts for producing high-quality translation results in
terms of both human perceptual studies and automatic quantitative measures.Comment: ECCV 201
Manipulating Attributes of Natural Scenes via Hallucination
In this study, we explore building a two-stage framework for enabling users
to directly manipulate high-level attributes of a natural scene. The key to our
approach is a deep generative network which can hallucinate images of a scene
as if they were taken at a different season (e.g. during winter), weather
condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the
scene is hallucinated with the given attributes, the corresponding look is then
transferred to the input image while preserving the semantic details intact,
giving a photo-realistic manipulation result. As the proposed framework
hallucinates what the scene will look like, it does not require any reference
style image as commonly utilized in most of the appearance or style transfer
approaches. Moreover, it allows to simultaneously manipulate a given scene
according to a diverse set of transient attributes within a single model,
eliminating the need of training multiple networks per each translation task.
Our comprehensive set of qualitative and quantitative results demonstrate the
effectiveness of our approach against the competing methods.Comment: Accepted for publication in ACM Transactions on Graphic
- …