6,577 research outputs found
CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
We present variational generative adversarial networks, a general learning
framework that combines a variational auto-encoder with a generative
adversarial network, for synthesizing images in fine-grained categories, such
as faces of a specific person or objects in a category. Our approach models an
image as a composition of label and latent attributes in a probabilistic model.
By varying the fine-grained category label fed into the resulting generative
model, we can generate images in a specific category with randomly drawn values
on a latent attribute vector. Our approach has two novel aspects. First, we
adopt a cross entropy loss for the discriminative and classifier network, but a
mean discrepancy objective for the generative network. This kind of asymmetric
loss function makes the GAN training more stable. Second, we adopt an encoder
network to learn the relationship between the latent space and the real image
space, and use pairwise feature matching to keep the structure of generated
images. We experiment with natural images of faces, flowers, and birds, and
demonstrate that the proposed models are capable of generating realistic and
diverse samples with fine-grained category labels. We further show that our
models can be applied to other tasks, such as image inpainting,
super-resolution, and data augmentation for training better face recognition
models.Comment: to appear in ICCV 201
Unpaired Image Super-Resolution using Pseudo-Supervision
In most studies on learning-based image super-resolution (SR), the paired
training dataset is created by downscaling high-resolution (HR) images with a
predetermined operation (e.g., bicubic). However, these methods fail to
super-resolve real-world low-resolution (LR) images, for which the degradation
process is much more complicated and unknown. In this paper, we propose an
unpaired SR method using a generative adversarial network that does not require
a paired/aligned training dataset. Our network consists of an unpaired
kernel/noise correction network and a pseudo-paired SR network. The correction
network removes noise and adjusts the kernel of the inputted LR image; then,
the corrected clean LR image is upscaled by the SR network. In the training
phase, the correction network also produces a pseudo-clean LR image from the
inputted HR image, and then a mapping from the pseudo-clean LR image to the
inputted HR image is learned by the SR network in a paired manner. Because our
SR network is independent of the correction network, well-studied existing
network architectures and pixel-wise loss functions can be integrated with the
proposed framework. Experiments on diverse datasets show that the proposed
method is superior to existing solutions to the unpaired SR problem.Comment: 10 pages, 10 figure
Bandwidth Extension on Raw Audio via Generative Adversarial Networks
Neural network-based methods have recently demonstrated state-of-the-art
results on image synthesis and super-resolution tasks, in particular by using
variants of generative adversarial networks (GANs) with supervised feature
losses. Nevertheless, previous feature loss formulations rely on the
availability of large auxiliary classifier networks, and labeled datasets that
enable such classifiers to be trained. Furthermore, there has been
comparatively little work to explore the applicability of GAN-based methods to
domains other than images and video. In this work we explore a GAN-based method
for audio processing, and develop a convolutional neural network architecture
to perform audio super-resolution. In addition to several new architectural
building blocks for audio processing, a key component of our approach is the
use of an autoencoder-based loss that enables training in the GAN framework,
with feature losses derived from unlabeled data. We explore the impact of our
architectural choices, and demonstrate significant improvements over previous
works in terms of both objective and perceptual quality
Towards the Automatic Anime Characters Creation with Generative Adversarial Networks
Automatic generation of facial images has been well studied after the
Generative Adversarial Network (GAN) came out. There exists some attempts
applying the GAN model to the problem of generating facial images of anime
characters, but none of the existing work gives a promising result. In this
work, we explore the training of GAN models specialized on an anime facial
image dataset. We address the issue from both the data and the model aspect, by
collecting a more clean, well-suited dataset and leverage proper, empirical
application of DRAGAN. With quantitative analysis and case studies we
demonstrate that our efforts lead to a stable and high-quality model. Moreover,
to assist people with anime character design, we build a website
(http://make.girls.moe) with our pre-trained model available online, which
makes the model easily accessible to general public.Comment: 16 pages, 15 figures. This paper is presented as a Doujinshi in
Comiket 92, summer 2017, with the booth number 05a, East-U, Third Da
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
A Deep Journey into Super-resolution: A survey
Deep convolutional networks based super-resolution is a fast-growing field
with numerous practical applications. In this exposition, we extensively
compare 30+ state-of-the-art super-resolution Convolutional Neural Networks
(CNNs) over three classical and three recently introduced challenging datasets
to benchmark single image super-resolution. We introduce a taxonomy for
deep-learning based super-resolution networks that groups existing methods into
nine categories including linear, residual, multi-branch, recursive,
progressive, attention-based and adversarial designs. We also provide
comparisons between the models in terms of network complexity, memory
footprint, model input and output, learning details, the type of network losses
and important architectural differences (e.g., depth, skip-connections,
filters). The extensive evaluation performed, shows the consistent and rapid
growth in the accuracy in the past few years along with a corresponding boost
in model complexity and the availability of large-scale datasets. It is also
observed that the pioneering methods identified as the benchmark have been
significantly outperformed by the current contenders. Despite the progress in
recent years, we identify several shortcomings of existing techniques and
provide future research directions towards the solution of these open problems.Comment: Accepted in ACM Computing Survey
SinGAN: Learning a Generative Model from a Single Natural Image
We introduce SinGAN, an unconditional generative model that can be learned
from a single natural image. Our model is trained to capture the internal
distribution of patches within the image, and is then able to generate high
quality, diverse samples that carry the same visual content as the image.
SinGAN contains a pyramid of fully convolutional GANs, each responsible for
learning the patch distribution at a different scale of the image. This allows
generating new samples of arbitrary size and aspect ratio, that have
significant variability, yet maintain both the global structure and the fine
textures of the training image. In contrast to previous single image GAN
schemes, our approach is not limited to texture images, and is not conditional
(i.e. it generates samples from noise). User studies confirm that the generated
samples are commonly confused to be real images. We illustrate the utility of
SinGAN in a wide range of image manipulation tasks.Comment: ICCV 201
Thermal Infrared Colorization via Conditional Generative Adversarial Network
Transforming a thermal infrared image into a realistic RGB image is a
challenging task. In this paper we propose a deep learning method to bridge
this gap. We propose learning the transformation mapping using a coarse-to-fine
generator that preserves the details. Since the standard mean squared loss
cannot penalize the distance between colorized and ground truth images well, we
propose a composite loss function that combines content, adversarial,
perceptual and total variation losses. The content loss is used to recover
global image information while the latter three losses are used to synthesize
local realistic textures. Quantitative and qualitative experiments demonstrate
that our approach significantly outperforms existing approaches
TransGaGa: Geometry-Aware Unsupervised Image-to-Image Translation
Unsupervised image-to-image translation aims at learning a mapping between
two visual domains. However, learning a translation across large geometry
variations always ends up with failure. In this work, we present a novel
disentangle-and-translate framework to tackle the complex objects
image-to-image translation task. Instead of learning the mapping on the image
space directly, we disentangle image space into a Cartesian product of the
appearance and the geometry latent spaces. Specifically, we first introduce a
geometry prior loss and a conditional VAE loss to encourage the network to
learn independent but complementary representations. The translation is then
built on appearance and geometry space separately. Extensive experiments
demonstrate the superior performance of our method to other state-of-the-art
approaches, especially in the challenging near-rigid and non-rigid objects
translation tasks. In addition, by taking different exemplars as the appearance
references, our method also supports multimodal translation. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htmlComment: Accepted to CVPR 2019. Project page:
https://wywu.github.io/projects/TGaGa/TGaGa.htm
Generative Creativity: Adversarial Learning for Bionic Design
Bionic design refers to an approach of generative creativity in which a
target object (e.g. a floor lamp) is designed to contain features of biological
source objects (e.g. flowers), resulting in creative biologically-inspired
design. In this work, we attempt to model the process of shape-oriented bionic
design as follows: given an input image of a design target object, the model
generates images that 1) maintain shape features of the input design target
image, 2) contain shape features of images from the specified biological source
domain, 3) are plausible and diverse. We propose DesignGAN, a novel
unsupervised deep generative approach to realising bionic design. Specifically,
we employ a conditional Generative Adversarial Networks architecture with
several designated losses (an adversarial loss, a regression loss, a cycle loss
and a latent loss) that respectively constrict our model to meet the
corresponding aforementioned requirements of bionic design modelling. We
perform qualitative and quantitative experiments to evaluate our method, and
demonstrate that our proposed approach successfully generates creative images
of bionic design
- …