84,396 research outputs found
Image-to-Image Translation with Conditional Adversarial Networks
We investigate conditional adversarial networks as a general-purpose solution
to image-to-image translation problems. These networks not only learn the
mapping from input image to output image, but also learn a loss function to
train this mapping. This makes it possible to apply the same generic approach
to problems that traditionally would require very different loss formulations.
We demonstrate that this approach is effective at synthesizing photos from
label maps, reconstructing objects from edge maps, and colorizing images, among
other tasks. Indeed, since the release of the pix2pix software associated with
this paper, a large number of internet users (many of them artists) have posted
their own experiments with our system, further demonstrating its wide
applicability and ease of adoption without the need for parameter tweaking. As
a community, we no longer hand-engineer our mapping functions, and this work
suggests we can achieve reasonable results without hand-engineering our loss
functions either.Comment: Website: https://phillipi.github.io/pix2pix/, CVPR 201
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Referring expressions are natural language constructions used to identify
particular objects within a scene. In this paper, we propose a unified
framework for the tasks of referring expression comprehension and generation.
Our model is composed of three modules: speaker, listener, and reinforcer. The
speaker generates referring expressions, the listener comprehends referring
expressions, and the reinforcer introduces a reward function to guide sampling
of more discriminative expressions. The listener-speaker modules are trained
jointly in an end-to-end learning framework, allowing the modules to be aware
of one another during learning while also benefiting from the discriminative
reinforcer's feedback. We demonstrate that this unified framework and training
achieves state-of-the-art results for both comprehension and generation on
three referring expression datasets. Project and demo page:
https://vision.cs.unc.edu/referComment: Some typo fixed; comprehension results on refcocog updated; more
human evaluation results adde
Learning Diverse Image Colorization
Colorization is an ambiguous problem, with multiple viable colorizations for
a single grey-level image. However, previous methods only produce the single
most probable colorization. Our goal is to model the diversity intrinsic to the
problem of colorization and produce multiple colorizations that display
long-scale spatial co-ordination. We learn a low dimensional embedding of color
fields using a variational autoencoder (VAE). We construct loss terms for the
VAE decoder that avoid blurry outputs and take into account the uneven
distribution of pixel colors. Finally, we build a conditional model for the
multi-modal distribution between grey-level image and the color field
embeddings. Samples from this conditional model result in diverse colorization.
We demonstrate that our method obtains better diverse colorizations than a
standard conditional variational autoencoder (CVAE) model, as well as a
recently proposed conditional generative adversarial network (cGAN).Comment: This revision to appear in CVPR1
- …