887 research outputs found
Efficient Super Resolution For Large-Scale Images Using Attentional GAN
Single Image Super Resolution (SISR) is a well-researched problem with broad
commercial relevance. However, most of the SISR literature focuses on
small-size images under 500px, whereas business needs can mandate the
generation of very high resolution images. At Expedia Group, we were tasked
with generating images of at least 2000px for display on the website, four
times greater than the sizes typically reported in the literature. This
requirement poses a challenge that state-of-the-art models, validated on small
images, have not been proven to handle. In this paper, we investigate solutions
to the problem of generating high-quality images for large-scale super
resolution in a commercial setting. We find that training a generative
adversarial network (GAN) with attention from scratch using a large-scale
lodging image data set generates images with high PSNR and SSIM scores. We
describe a novel attentional SISR model for large-scale images, A-SRGAN, that
uses a Flexible Self Attention layer to enable processing of large-scale
images. We also describe a distributed algorithm which speeds up training by
around a factor of five.Comment: Accepted by IEEE International Conference on Big Data, 201
Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem that aims to
generate a high-resolution (HR) face image from a low-resolution~(LR) input. In
contrast to the existing patch-wise super-resolution models that divide a face
image into regular patches and independently apply LR to HR mapping to each
patch, we implement deep reinforcement learning and develop a novel
attention-aware face hallucination (Attention-FH) framework, which recurrently
learns to attend a sequence of patches and performs facial part enhancement by
fully exploiting the global interdependency of the image. Specifically, our
proposed framework incorporates two components: a recurrent policy network for
dynamically specifying a new attended region at each time step based on the
status of the super-resolved image and the past attended region sequence, and a
local enhancement network for selected patch hallucination and global state
updating. The Attention-FH model jointly learns the recurrent policy network
and local enhancement network through maximizing a long-term reward that
reflects the hallucination result with respect to the whole HR image. Extensive
experiments demonstrate that our Attention-FH significantly outperforms the
state-of-the-art methods on in-the-wild face images with large pose and
illumination variations.Comment: To be published in TPAM
TGAN: Deep Tensor Generative Adversarial Nets for Large Image Generation
Deep generative models have been successfully applied to many applications.
However, existing works experience limitations when generating large images
(the literature usually generates small images, e.g. 32 * 32 or 128 * 128). In
this paper, we propose a novel scheme, called deep tensor adversarial
generative nets (TGAN), that generates large high-quality images by exploring
tensor structures. Essentially, the adversarial process of TGAN takes place in
a tensor space. First, we impose tensor structures for concise image
representation, which is superior in capturing the pixel proximity information
and the spatial patterns of elementary objects in images, over the
vectorization preprocess in existing works. Secondly, we propose TGAN that
integrates deep convolutional generative adversarial networks and tensor
super-resolution in a cascading manner, to generate high-quality images from
random distributions. More specifically, we design a tensor super-resolution
process that consists of tensor dictionary learning and tensor coefficients
learning. Finally, on three datasets, the proposed TGAN generates images with
more realistic textures, compared with state-of-the-art adversarial
autoencoders. The size of the generated images is increased by over 8.5 times,
namely 374 * 374 in PASCAL2
Progressive Pose Attention Transfer for Person Image Generation
This paper proposes a new generative adversarial network for pose transfer,
i.e., transferring the pose of a given person to a target pose. The generator
of the network comprises a sequence of Pose-Attentional Transfer Blocks that
each transfers certain regions it attends to, generating the person image
progressively. Compared with those in previous works, our generated person
images possess better appearance consistency and shape consistency with the
input images, thus significantly more realistic-looking. The efficacy and
efficiency of the proposed network are validated both qualitatively and
quantitatively on Market-1501 and DeepFashion. Furthermore, the proposed
architecture can generate training images for person re-identification,
alleviating data insufficiency. Codes and models are available at:
https://github.com/tengteng95/Pose-Transfer.git.Comment: To appear in CVPR 2019, oral presentation (21 pages, 15 figures
including the supplementary materials
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
In this paper we introduce a generative parametric model capable of producing
high quality samples of natural images. Our approach uses a cascade of
convolutional networks within a Laplacian pyramid framework to generate images
in a coarse-to-fine fashion. At each level of the pyramid, a separate
generative convnet model is trained using the Generative Adversarial Nets (GAN)
approach (Goodfellow et al.). Samples drawn from our model are of significantly
higher quality than alternate approaches. In a quantitative assessment by human
evaluators, our CIFAR10 samples were mistaken for real images around 40% of the
time, compared to 10% for samples drawn from a GAN baseline model. We also show
samples from models trained on the higher resolution images of the LSUN scene
dataset
Difficulty-aware Image Super Resolution via Deep Adaptive Dual-Network
Recently, deep learning based single image super-resolution(SR) approaches
have achieved great development. The state-of-the-art SR methods usually adopt
a feed-forward pipeline to establish a non-linear mapping between low-res(LR)
and high-res(HR) images. However, due to treating all image regions equally
without considering the difficulty diversity, these approaches meet an upper
bound for optimization. To address this issue, we propose a novel SR approach
that discriminately processes each image region within an image by its
difficulty. Specifically, we propose a dual-way SR network that one way is
trained to focus on easy image regions and another is trained to handle hard
image regions. To identify whether a region is easy or hard, we propose a novel
image difficulty recognition network based on PSNR prior. Our SR approach that
uses the region mask to adaptively enforce the dual-way SR network yields
superior results. Extensive experiments on several standard benchmarks (e.g.,
Set5, Set14, BSD100, and Urban100) show that our approach achieves
state-of-the-art performance.Comment: ICME2019(Oral), code and results are available at:
https://github.com/xzwlx/Difficulty-S
When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey
With widespread applications of artificial intelligence (AI), the
capabilities of the perception, understanding, decision-making and control for
autonomous systems have improved significantly in the past years. When
autonomous systems consider the performance of accuracy and transferability,
several AI methods, like adversarial learning, reinforcement learning (RL) and
meta-learning, show their powerful performance. Here, we review the
learning-based approaches in autonomous systems from the perspectives of
accuracy and transferability. Accuracy means that a well-trained model shows
good results during the testing phase, in which the testing set shares a same
task or a data distribution with the training set. Transferability means that
when a well-trained model is transferred to other testing domains, the accuracy
is still good. Firstly, we introduce some basic concepts of transfer learning
and then present some preliminaries of adversarial learning, RL and
meta-learning. Secondly, we focus on reviewing the accuracy or transferability
or both of them to show the advantages of adversarial learning, like generative
adversarial networks (GANs), in typical computer vision tasks in autonomous
systems, including image style transfer, image superresolution, image
deblurring/dehazing/rain removal, semantic segmentation, depth estimation,
pedestrian detection and person re-identification (re-ID). Then, we further
review the performance of RL and meta-learning from the aspects of accuracy or
transferability or both of them in autonomous systems, involving pedestrian
tracking, robot navigation and robotic manipulation. Finally, we discuss
several challenges and future topics for using adversarial learning, RL and
meta-learning in autonomous systems
Learning to Globally Edit Images with Textual Description
We show how we can globally edit images using textual instructions: given a
source image and a textual instruction for the edit, generate a new image
transformed under this instruction. To tackle this novel problem, we develop
three different trainable models based on RNN and Generative Adversarial
Network (GAN). The models (bucket, filter bank, and end-to-end) differ in how
much expert knowledge is encoded, with the most general version being purely
end-to-end. To train these systems, we use Amazon Mechanical Turk to collect
textual descriptions for around 2000 image pairs sampled from several datasets.
Experimental results evaluated on our dataset validate our approaches. In
addition, given that the filter bank model is a good compromise between
generality and performance, we investigate it further by replacing RNN with
Graph RNN, and show that Graph RNN improves performance. To the best of our
knowledge, this is the first computational photography work on global image
editing that is purely based on free-form textual instructions
Efficient Neural Architecture for Text-to-Image Synthesis
Text-to-image synthesis is the task of generating images from text
descriptions. Image generation, by itself, is a challenging task. When we
combine image generation and text, we bring complexity to a new level: we need
to combine data from two different modalities. Most of recent works in
text-to-image synthesis follow a similar approach when it comes to neural
architectures. Due to aforementioned difficulties, plus the inherent difficulty
of training GANs at high resolutions, most methods have adopted a multi-stage
training strategy. In this paper we shift the architectural paradigm currently
used in text-to-image methods and show that an effective neural architecture
can achieve state-of-the-art performance using a single stage training with a
single generator and a single discriminator. We do so by applying deep residual
networks along with a novel sentence interpolation strategy that enables
learning a smooth conditional space. Finally, our work points a new direction
for text-to-image research, which has not experimented with novel neural
architectures recently
Generative Adversarial Network in Medical Imaging: A Review
Generative adversarial networks have gained a lot of attention in the
computer vision community due to their capability of data generation without
explicitly modelling the probability density function. The adversarial loss
brought by the discriminator provides a clever way of incorporating unlabeled
samples into training and imposing higher order consistency. This has proven to
be useful in many cases, such as domain adaptation, data augmentation, and
image-to-image translation. These properties have attracted researchers in the
medical imaging community, and we have seen rapid adoption in many traditional
and novel applications, such as image reconstruction, segmentation, detection,
classification, and cross-modality synthesis. Based on our observations, this
trend will continue and we therefore conducted a review of recent advances in
medical imaging using the adversarial training scheme with the hope of
benefiting researchers interested in this technique.Comment: 24 pages; v4; added missing references from before Jan 1st 2019;
accepted to MedI
- …