16,614 research outputs found
Fast Face-swap Using Convolutional Neural Networks
We consider the problem of face swapping in images, where an input identity
is transformed into a target identity while preserving pose, facial expression,
and lighting. To perform this mapping, we use convolutional neural networks
trained to capture the appearance of the target identity from an unstructured
collection of his/her photographs.This approach is enabled by framing the face
swapping problem in terms of style transfer, where the goal is to render an
image in the style of another one. Building on recent advances in this area, we
devise a new loss function that enables the network to produce highly
photorealistic results. By combining neural networks with simple pre- and
post-processing steps, we aim at making face swap work in real-time with no
input from the user
PixColor: Pixel Recursive Colorization
We propose a novel approach to automatically produce multiple colorized
versions of a grayscale image. Our method results from the observation that the
task of automated colorization is relatively easy given a low-resolution
version of the color image. We first train a conditional PixelCNN to generate a
low resolution color for a given grayscale image. Then, given the generated
low-resolution color image and the original grayscale image as inputs, we train
a second CNN to generate a high-resolution colorization of an image. We
demonstrate that our approach produces more diverse and plausible colorizations
than existing methods, as judged by human raters in a "Visual Turing Test"
End-to-end Projector Photometric Compensation
Projector photometric compensation aims to modify a projector input image
such that it can compensate for disturbance from the appearance of projection
surface. In this paper, for the first time, we formulate the compensation
problem as an end-to-end learning problem and propose a convolutional neural
network, named CompenNet, to implicitly learn the complex compensation
function. CompenNet consists of a UNet-like backbone network and an autoencoder
subnet. Such architecture encourages rich multi-level interactions between the
camera-captured projection surface image and the input image, and thus captures
both photometric and environment information of the projection surface. In
addition, the visual details and interaction information are carried to deeper
layers along the multi-level skip convolution layers. The architecture is of
particular importance for the projector compensation task, for which only a
small training dataset is allowed in practice. Another contribution we make is
a novel evaluation benchmark, which is independent of system setup and thus
quantitatively verifiable. Such benchmark is not previously available, to our
best knowledge, due to the fact that conventional evaluation requests the
hardware system to actually project the final results. Our key idea, motivated
from our end-to-end problem formulation, is to use a reasonable surrogate to
avoid such projection process so as to be setup-independent. Our method is
evaluated carefully on the benchmark, and the results show that our end-to-end
learning solution outperforms state-of-the-arts both qualitatively and
quantitatively by a significant margin.Comment: To appear in the 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Source code and dataset are available at
https://github.com/BingyaoHuang/compenne
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
We present a new method for synthesizing high-resolution photo-realistic
images from semantic label maps using conditional generative adversarial
networks (conditional GANs). Conditional GANs have enabled a variety of
applications, but the results are often limited to low-resolution and still far
from realistic. In this work, we generate 2048x1024 visually appealing results
with a novel adversarial loss, as well as new multi-scale generator and
discriminator architectures. Furthermore, we extend our framework to
interactive visual manipulation with two additional features. First, we
incorporate object instance segmentation information, which enables object
manipulations such as removing/adding objects and changing the object category.
Second, we propose a method to generate diverse results given the same input,
allowing users to edit the object appearance interactively. Human opinion
studies demonstrate that our method significantly outperforms existing methods,
advancing both the quality and the resolution of deep image synthesis and
editing.Comment: v2: CVPR camera ready, adding more results for edge-to-photo example
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Low-end and compact mobile cameras demonstrate limited photo quality mainly
due to space, hardware and budget constraints. In this work, we propose a deep
learning solution that translates photos taken by cameras with limited
capabilities into DSLR-quality photos automatically. We tackle this problem by
introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image
Generative Adversarial Network-based architecture. The proposed model is
trained by under weak supervision: unlike previous works, there is no need for
strong supervision in the form of a large annotated dataset of aligned
original/enhanced photo pairs. The sole requirement is two distinct datasets:
one from the source camera, and one composed of arbitrary high-quality images
that can be generally crawled from the Internet - the visual content they
exhibit may be unrelated. Hence, our solution is repeatable for any camera:
collecting the data and training can be achieved in a couple of hours. In this
work, we emphasize on extensive evaluation of obtained results. Besides
standard objective metrics and subjective user study, we train a virtual rater
in the form of a separate CNN that mimics human raters on Flickr data and use
this network to get reference scores for both original and enhanced photos. Our
experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from
several generations of smartphones demonstrate that WESPE produces comparable
or improved qualitative results with state-of-the-art strongly supervised
methods
- …