1,035 research outputs found
FH-GAN: Face Hallucination and Recognition using Generative Adversarial Network
There are many factors affecting visual face recognition, such as low
resolution images, aging, illumination and pose variance, etc. One of the most
important problem is low resolution face images which can result in bad
performance on face recognition. Most of the general face recognition
algorithms usually assume a sufficient resolution for the face images. However,
in practice many applications often do not have sufficient image resolutions.
The modern face hallucination models demonstrate reasonable performance to
reconstruct high-resolution images from its corresponding low resolution
images. However, they do not consider identity level information during
hallucination which directly affects results of the recognition of low
resolution faces. To address this issue, we propose a Face Hallucination
Generative Adversarial Network (FH-GAN) which improves the quality of low
resolution face images and accurately recognize those low quality images.
Concretely, we make the following contributions: 1) we propose FH-GAN network,
an end-to-end system, that improves both face hallucination and face
recognition simultaneously. The novelty of this proposed network depends on
incorporating identity information in a GAN-based face hallucination algorithm
via combining a face recognition network for identity preserving. 2) We also
propose a new face hallucination network, namely Dense Sparse Network (DSNet),
which improves upon the state-of-art in face hallucination. 3) We demonstrate
benefits of training the face recognition and GAN-based DSNet jointly by
reporting good result on face hallucination and recognition.Comment: 9 page
Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem that aims to
generate a high-resolution (HR) face image from a low-resolution~(LR) input. In
contrast to the existing patch-wise super-resolution models that divide a face
image into regular patches and independently apply LR to HR mapping to each
patch, we implement deep reinforcement learning and develop a novel
attention-aware face hallucination (Attention-FH) framework, which recurrently
learns to attend a sequence of patches and performs facial part enhancement by
fully exploiting the global interdependency of the image. Specifically, our
proposed framework incorporates two components: a recurrent policy network for
dynamically specifying a new attended region at each time step based on the
status of the super-resolved image and the past attended region sequence, and a
local enhancement network for selected patch hallucination and global state
updating. The Attention-FH model jointly learns the recurrent policy network
and local enhancement network through maximizing a long-term reward that
reflects the hallucination result with respect to the whole HR image. Extensive
experiments demonstrate that our Attention-FH significantly outperforms the
state-of-the-art methods on in-the-wild face images with large pose and
illumination variations.Comment: To be published in TPAM
Image Super-Resolution via Deterministic-Stochastic Synthesis and Local Statistical Rectification
Single image superresolution has been a popular research topic in the last
two decades and has recently received a new wave of interest due to deep neural
networks. In this paper, we approach this problem from a different perspective.
With respect to a downsampled low resolution image, we model a high resolution
image as a combination of two components, a deterministic component and a
stochastic component. The deterministic component can be recovered from the
low-frequency signals in the downsampled image. The stochastic component, on
the other hand, contains the signals that have little correlation with the low
resolution image. We adopt two complementary methods for generating these two
components. While generative adversarial networks are used for the stochastic
component, deterministic component reconstruction is formulated as a regression
problem solved using deep neural networks. Since the deterministic component
exhibits clearer local orientations, we design novel loss functions tailored
for such properties for training the deep regression network. These two methods
are first applied to the entire input image to produce two distinct
high-resolution images. Afterwards, these two images are fused together using
another deep neural network that also performs local statistical rectification,
which tries to make the local statistics of the fused image match the same
local statistics of the groundtruth image. Quantitative results and a user
study indicate that the proposed method outperforms existing state-of-the-art
algorithms with a clear margin.Comment: to appear in SIGGRAPH Asia 201
CaricatureShop: Personalized and Photorealistic Caricature Sketching
In this paper, we propose the first sketching system for interactively
personalized and photorealistic face caricaturing. Input an image of a human
face, the users can create caricature photos by manipulating its facial feature
curves. Our system firstly performs exaggeration on the recovered 3D face model
according to the edited sketches, which is conducted by assigning the laplacian
of each vertex a scaling factor. To construct the mapping between 2D sketches
and a vertex-wise scaling field, a novel deep learning architecture is
developed. With the obtained 3D caricature model, two images are generated, one
obtained by applying 2D warping guided by the underlying 3D mesh deformation
and the other obtained by re-rendering the deformed 3D textured model. These
two images are then seamlessly integrated to produce our final output. Due to
the severely stretching of meshes, the rendered texture is of blurry
appearances. A deep learning approach is exploited to infer the missing details
for enhancing these blurry regions. Moreover, a relighting operation is
invented to further improve the photorealism of the result. Both quantitative
and qualitative experiment results validated the efficiency of our sketching
system and the superiority of our proposed techniques against existing methods.Comment: 12 pages,16 figures,submitted to IEEE TVC
Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination
Most of the current face hallucination methods, whether they are shallow
learning-based or deep learning-based, all try to learn a relationship model
between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a
training set. They mainly focus on modeling image prior through either
model-based optimization or discriminative inference learning. However, when
the input LR face is tiny, the learned prior knowledge is no longer effective
and their performance will drop sharply. To solve this problem, in this paper
we propose a general face hallucination method that can integrate model-based
optimization and discriminative inference. In particular, to exploit the model
based prior, the Deep Convolutional Neural Networks (CNN) denoiser prior is
plugged into the super-resolution optimization model with the aid of
image-adaptive Laplacian regularization. Additionally, we further develop a
high-frequency details compensation method by dividing the face image to facial
components and performing face hallucination in a multi-layer neighbor
embedding manner. Experiments demonstrate that the proposed method can achieve
promising super-resolution results for tiny input LR faces.Comment: Accepted by IJCAI 201
Face Recognition in Low Quality Images: A Survey
Low-resolution face recognition (LRFR) has received increasing attention over
the past few years. Its applications lie widely in the real-world environment
when high-resolution or high-quality images are hard to capture. One of the
biggest demands for LRFR technologies is video surveillance. As the the number
of surveillance cameras in the city increases, the videos that captured will
need to be processed automatically. However, those videos or images are usually
captured with large standoffs, arbitrary illumination condition, and diverse
angles of view. Faces in these images are generally small in size. Several
studies addressed this problem employed techniques like super resolution,
deblurring, or learning a relationship between different resolution domains. In
this paper, we provide a comprehensive review of approaches to low-resolution
face recognition in the past five years. First, a general problem definition is
given. Later, systematically analysis of the works on this topic is presented
by catogory. In addition to describing the methods, we also focus on datasets
and experiment settings. We further address the related works on unconstrained
low-resolution face recognition and compare them with the result that use
synthetic low-resolution data. Finally, we summarized the general limitations
and speculate a priorities for the future effort.Comment: There are some mistakes addressing in this paper which will be
misleading to the reader and we wont have a new version in short time. We
will resubmit once it is being corecte
Attentive Crowd Flow Machines
Traffic flow prediction is crucial for urban traffic management and public
safety. Its key challenges lie in how to adaptively integrate the various
factors that affect the flow changes. In this paper, we propose a unified
neural network module to address this problem, called Attentive Crowd Flow
Machine~(ACFM), which is able to infer the evolution of the crowd flow by
learning dynamic representations of temporally-varying data with an attention
mechanism. Specifically, the ACFM is composed of two progressive ConvLSTM units
connected with a convolutional layer for spatial weight prediction. The first
LSTM takes the sequential flow density representation as input and generates a
hidden state at each time-step for attention map inference, while the second
LSTM aims at learning the effective spatial-temporal feature expression from
attentionally weighted crowd flow features. Based on the ACFM, we further build
a deep architecture with the application to citywide crowd flow prediction,
which naturally incorporates the sequential and periodic data as well as other
external influences. Extensive experiments on two standard benchmarks (i.e.,
crowd flow in Beijing and New York City) show that the proposed method achieves
significant improvements over the state-of-the-art methods.Comment: ACM MM, full pape
Learning Spatial Attention for Face Super-Resolution
General image super-resolution techniques have difficulties in recovering
detailed face structures when applying to low resolution face images. Recent
deep learning based methods tailored for face images have achieved improved
performance by jointly trained with additional task such as face parsing and
landmark prediction. However, multi-task learning requires extra manually
labeled data. Besides, most of the existing works can only generate relatively
low resolution face images (e.g., ), and their applications are
therefore limited. In this paper, we introduce a novel SPatial Attention
Residual Network (SPARNet) built on our newly proposed Face Attention Units
(FAUs) for face super-resolution. Specifically, we introduce a spatial
attention mechanism to the vanilla residual blocks. This enables the
convolutional layers to adaptively bootstrap features related to the key face
structures and pay less attention to those less feature-rich regions. This
makes the training more effective and efficient as the key face structures only
account for a very small portion of the face image. Visualization of the
attention maps shows that our spatial attention network can capture the key
face structures well even for very low resolution faces (e.g., ).
Quantitative comparisons on various kinds of metrics (including PSNR, SSIM,
identity similarity, and landmark detection) demonstrate the superiority of our
method over current state-of-the-arts. We further extend SPARNet with
multi-scale discriminators, named as SPARNetHD, to produce high resolution
results (i.e., ). We show that SPARNetHD trained with synthetic
data cannot only produce high quality and high resolution outputs for
synthetically degraded face images, but also show good generalization ability
to real world low quality face images.Comment: TIP 2020. Codes are available at
https://github.com/chaofengc/Face-SPARNe
Generative Image Inpainting with Contextual Attention
Recent deep learning based approaches have shown promising results for the
challenging task of inpainting large missing regions in an image. These methods
can generate visually plausible image structures and textures, but often create
distorted structures or blurry textures inconsistent with surrounding areas.
This is mainly due to ineffectiveness of convolutional neural networks in
explicitly borrowing or copying information from distant spatial locations. On
the other hand, traditional texture and patch synthesis approaches are
particularly suitable when it needs to borrow textures from the surrounding
regions. Motivated by these observations, we propose a new deep generative
model-based approach which can not only synthesize novel image structures but
also explicitly utilize surrounding image features as references during network
training to make better predictions. The model is a feed-forward, fully
convolutional neural network which can process images with multiple holes at
arbitrary locations and with variable sizes during the test time. Experiments
on multiple datasets including faces (CelebA, CelebA-HQ), textures (DTD) and
natural images (ImageNet, Places2) demonstrate that our proposed approach
generates higher-quality inpainting results than existing ones. Code, demo and
models are available at: https://github.com/JiahuiYu/generative_inpainting.Comment: Accepted in CVPR 2018; add CelebA-HQ results; open sourced;
interactive demo available: http://jhyu.me/dem
200x Low-dose PET Reconstruction using Deep Learning
Positron emission tomography (PET) is widely used in various clinical
applications, including cancer diagnosis, heart disease and neuro disorders.
The use of radioactive tracer in PET imaging raises concerns due to the risk of
radiation exposure. To minimize this potential risk in PET imaging, efforts
have been made to reduce the amount of radio-tracer usage. However, lowing dose
results in low Signal-to-Noise-Ratio (SNR) and loss of information, both of
which will heavily affect clinical diagnosis. Besides, the ill-conditioning of
low-dose PET image reconstruction makes it a difficult problem for iterative
reconstruction algorithms. Previous methods proposed are typically complicated
and slow, yet still cannot yield satisfactory results at significantly low
dose. Here, we propose a deep learning method to resolve this issue with an
encoder-decoder residual deep network with concatenate skip connections.
Experiments shows the proposed method can reconstruct low-dose PET image to a
standard-dose quality with only two-hundredth dose. Different cost functions
for training model are explored. Multi-slice input strategy is introduced to
provide the network with more structural information and make it more robust to
noise. Evaluation on ultra-low-dose clinical data shows that the proposed
method can achieve better result than the state-of-the-art methods and
reconstruct images with comparable quality using only 0.5% of the original
regular dose
- …