5,188 research outputs found
Super-Identity Convolutional Neural Network for Face Hallucination
Face hallucination is a generative task to super-resolve the facial image
with low resolution while human perception of face heavily relies on identity
information. However, previous face hallucination approaches largely ignore
facial identity recovery. This paper proposes Super-Identity Convolutional
Neural Network (SICNN) to recover identity information for generating faces
closed to the real identity. Specifically, we define a super-identity loss to
measure the identity difference between a hallucinated face and its
corresponding high-resolution face within the hypersphere identity metric
space. However, directly using this loss will lead to a Dynamic Domain
Divergence problem, which is caused by the large margin between the
high-resolution domain and the hallucination domain. To overcome this
challenge, we present a domain-integrated training approach by constructing a
robust identity metric for faces from these two domains. Extensive experimental
evaluations demonstrate that the proposed SICNN achieves superior visual
quality over the state-of-the-art methods on a challenging task to
super-resolve 1214 faces with an 8 upscaling factor. In
addition, SICNN significantly improves the recognizability of
ultra-low-resolution faces.Comment: Published in ECCV 201
CT Image Enhancement Using Stacked Generative Adversarial Networks and Transfer Learning for Lesion Segmentation Improvement
Automated lesion segmentation from computed tomography (CT) is an important
and challenging task in medical image analysis. While many advancements have
been made, there is room for continued improvements. One hurdle is that CT
images can exhibit high noise and low contrast, particularly in lower dosages.
To address this, we focus on a preprocessing method for CT images that uses
stacked generative adversarial networks (SGAN) approach. The first GAN reduces
the noise in the CT image and the second GAN generates a higher resolution
image with enhanced boundaries and high contrast. To make up for the absence of
high quality CT images, we detail how to synthesize a large number of low- and
high-quality natural images and use transfer learning with progressively larger
amounts of CT images. We apply both the classic GrabCut method and the modern
holistically nested network (HNN) to lesion segmentation, testing whether SGAN
can yield improved lesion segmentation. Experimental results on the DeepLesion
dataset demonstrate that the SGAN enhancements alone can push GrabCut
performance over HNN trained on original images. We also demonstrate that HNN +
SGAN performs best compared against four other enhancement methods, including
when using only a single GAN.Comment: Accepted by MLMI 201
Yes, we GAN: Applying Adversarial Techniques for Autonomous Driving
Generative Adversarial Networks (GAN) have gained a lot of popularity from
their introduction in 2014 till present. Research on GAN is rapidly growing and
there are many variants of the original GAN focusing on various aspects of deep
learning. GAN are perceived as the most impactful direction of machine learning
in the last decade. This paper focuses on the application of GAN in autonomous
driving including topics such as advanced data augmentation, loss function
learning, semi-supervised learning, etc. We formalize and review key
applications of adversarial techniques and discuss challenges and open problems
to be addressed.Comment: Accepted for publication in Electronic Imaging, Autonomous Vehicles
and Machines 2019. arXiv admin note: text overlap with arXiv:1606.05908 by
other author
Fast Underwater Image Enhancement for Improved Visual Perception
In this paper, we present a conditional generative adversarial network-based
model for real-time underwater image enhancement. To supervise the adversarial
training, we formulate an objective function that evaluates the perceptual
image quality based on its global content, color, local texture, and style
information. We also present EUVP, a large-scale dataset of a paired and
unpaired collection of underwater images (of `poor' and `good' quality) that
are captured using seven different cameras over various visibility conditions
during oceanic explorations and human-robot collaborative experiments. In
addition, we perform several qualitative and quantitative evaluations which
suggest that the proposed model can learn to enhance underwater image quality
from both paired and unpaired training. More importantly, the enhanced images
provide improved performances of standard models for underwater object
detection, human pose estimation, and saliency prediction. These results
validate that it is suitable for real-time preprocessing in the autonomy
pipeline by visually-guided underwater robots. The model and associated
training pipelines are available at https://github.com/xahidbuffon/funie-gan
Chinese Typeface Transformation with Hierarchical Adversarial Network
In this paper, we explore automated typeface generation through image style
transfer which has shown great promise in natural image generation. Existing
style transfer methods for natural images generally assume that the source and
target images share similar high-frequency features. However, this assumption
is no longer true in typeface transformation. Inspired by the recent
advancement in Generative Adversarial Networks (GANs), we propose a
Hierarchical Adversarial Network (HAN) for typeface transformation. The
proposed HAN consists of two sub-networks: a transfer network and a
hierarchical adversarial discriminator. The transfer network maps characters
from one typeface to another. A unique characteristic of typefaces is that the
same radicals may have quite different appearances in different characters even
under the same typeface. Hence, a stage-decoder is employed by the transfer
network to leverage multiple feature layers, aiming to capture both the global
and local features. The hierarchical adversarial discriminator implicitly
measures data discrepancy between the generated domain and the target domain.
To leverage the complementary discriminating capability of different feature
layers, a hierarchical structure is proposed for the discriminator. We have
experimentally demonstrated that HAN is an effective framework for typeface
transfer and characters restoration.Comment: 8 pages(exclude reference), 6 figure
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Large-scale labeled data are generally required to train deep neural networks
in order to obtain better performance in visual feature learning from images or
videos for computer vision applications. To avoid extensive cost of collecting
and annotating large-scale datasets, as a subset of unsupervised learning
methods, self-supervised learning methods are proposed to learn general image
and video features from large-scale unlabeled data without using any
human-annotated labels. This paper provides an extensive review of deep
learning-based self-supervised general visual feature learning methods from
images or videos. First, the motivation, general pipeline, and terminologies of
this field are described. Then the common deep neural network architectures
that used for self-supervised learning are summarized. Next, the main
components and evaluation metrics of self-supervised learning methods are
reviewed followed by the commonly used image and video datasets and the
existing self-supervised visual feature learning methods. Finally, quantitative
performance comparisons of the reviewed methods on benchmark datasets are
summarized and discussed for both image and video feature learning. At last,
this paper is concluded and lists a set of promising future directions for
self-supervised visual feature learning
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
As an essential problem in computer vision, salient object detection (SOD)
has attracted an increasing amount of research attention over the years. Recent
advances in SOD are predominantly led by deep learning-based solutions (named
deep SOD). To enable in-depth understanding of deep SOD, in this paper, we
provide a comprehensive survey covering various aspects, ranging from algorithm
taxonomy to unsolved issues. In particular, we first review deep SOD algorithms
from different perspectives, including network architecture, level of
supervision, learning paradigm, and object-/instance-level detection. Following
that, we summarize and analyze existing SOD datasets and evaluation metrics.
Then, we benchmark a large group of representative SOD models, and provide
detailed analyses of the comparison results. Moreover, we study the performance
of SOD algorithms under different attribute settings, which has not been
thoroughly explored previously, by constructing a novel SOD dataset with rich
attribute annotations covering various salient object types, challenging
factors, and scene categories. We further analyze, for the first time in the
field, the robustness of SOD models to random input perturbations and
adversarial attacks. We also look into the generalization and difficulty of
existing SOD datasets. Finally, we discuss several open issues of SOD and
outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our
constructed dataset with annotations, and codes for evaluation are publicly
available at \url{https://github.com/wenguanwang/SODsurvey
RAN4IQA: Restorative Adversarial Nets for No-Reference Image Quality Assessment
Inspired by the free-energy brain theory, which implies that human visual
system (HVS) tends to reduce uncertainty and restore perceptual details upon
seeing a distorted image, we propose restorative adversarial net (RAN), a
GAN-based model for no-reference image quality assessment (NR-IQA). RAN, which
mimics the process of HVS, consists of three components: a restorator, a
discriminator and an evaluator. The restorator restores and reconstructs input
distorted image patches, while the discriminator distinguishes the
reconstructed patches from the pristine distortion-free patches. After
restoration, we observe that the perceptual distance between the restored and
the distorted patches is monotonic with respect to the distortion level. We
further define Gain of Restoration (GoR) based on this phenomenon. The
evaluator predicts perceptual score by extracting feature representations from
the distorted and restored patches to measure GoR. Eventually, the quality
score of an input image is estimated by weighted sum of the patch scores.
Experimental results on Waterloo Exploration, LIVE and TID2013 show the
effectiveness and generalization ability of RAN compared to the
state-of-the-art NR-IQA models.Comment: AAAI'1
Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements
Removing undesirable reflections from a single image captured through a glass
window is of practical importance to visual computing systems. Although
state-of-the-art methods can obtain decent results in certain situations,
performance declines significantly when tackling more general real-world cases.
These failures stem from the intrinsic difficulty of single image reflection
removal -- the fundamental ill-posedness of the problem, and the insufficiency
of densely-labeled training data needed for resolving this ambiguity within
learning-based neural network pipelines. In this paper, we address these issues
by exploiting targeted network enhancements and the novel use of misaligned
data. For the former, we augment a baseline network architecture by embedding
context encoding modules that are capable of leveraging high-level contextual
clues to reduce indeterminacy within areas containing strong reflections. For
the latter, we introduce an alignment-invariant loss function that facilitates
exploiting misaligned real-world training data that is much easier to collect.
Experimental results collectively show that our method outperforms the
state-of-the-art with aligned data, and that significant improvements are
possible when using additional misaligned data.Comment: Accepted to CVPR2019; code is available at
https://github.com/Vandermode/ERRNe
Perception Consistency Ultrasound Image Super-resolution via Self-supervised CycleGAN
Due to the limitations of sensors, the transmission medium and the intrinsic
properties of ultrasound, the quality of ultrasound imaging is always not
ideal, especially its low spatial resolution. To remedy this situation, deep
learning networks have been recently developed for ultrasound image
super-resolution (SR) because of the powerful approximation capability.
However, most current supervised SR methods are not suitable for ultrasound
medical images because the medical image samples are always rare, and usually,
there are no low-resolution (LR) and high-resolution (HR) training pairs in
reality. In this work, based on self-supervision and cycle generative
adversarial network (CycleGAN), we propose a new perception consistency
ultrasound image super-resolution (SR) method, which only requires the LR
ultrasound data and can ensure the re-degenerated image of the generated SR one
to be consistent with the original LR image, and vice versa. We first generate
the HR fathers and the LR sons of the test ultrasound LR image through image
enhancement, and then make full use of the cycle loss of LR-SR-LR and HR-LR-SR
and the adversarial characteristics of the discriminator to promote the
generator to produce better perceptually consistent SR results. The evaluation
of PSNR/IFC/SSIM, inference efficiency and visual effects under the benchmark
CCA-US and CCA-US datasets illustrate our proposed approach is effective and
superior to other state-of-the-art methods
- …