358 research outputs found
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Face Hallucination by Attentive Sequence Optimization with Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem that aims to
generate a high-resolution (HR) face image from a low-resolution~(LR) input. In
contrast to the existing patch-wise super-resolution models that divide a face
image into regular patches and independently apply LR to HR mapping to each
patch, we implement deep reinforcement learning and develop a novel
attention-aware face hallucination (Attention-FH) framework, which recurrently
learns to attend a sequence of patches and performs facial part enhancement by
fully exploiting the global interdependency of the image. Specifically, our
proposed framework incorporates two components: a recurrent policy network for
dynamically specifying a new attended region at each time step based on the
status of the super-resolved image and the past attended region sequence, and a
local enhancement network for selected patch hallucination and global state
updating. The Attention-FH model jointly learns the recurrent policy network
and local enhancement network through maximizing a long-term reward that
reflects the hallucination result with respect to the whole HR image. Extensive
experiments demonstrate that our Attention-FH significantly outperforms the
state-of-the-art methods on in-the-wild face images with large pose and
illumination variations.Comment: To be published in TPAM
Face Super-Resolution Guided by 3D Facial Priors
State-of-the-art face super-resolution methods employ deep convolutional
neural networks to learn a mapping between low- and high- resolution facial
patterns by exploring local appearance knowledge. However, most of these
methods do not well exploit facial structures and identity information, and
struggle to deal with facial images that exhibit large pose variations. In this
paper, we propose a novel face super-resolution method that explicitly
incorporates 3D facial priors which grasp the sharp facial structures. Our work
is the first to explore 3D morphable knowledge based on the fusion of
parametric descriptions of face attributes (e.g., identity, facial expression,
texture, illumination, and face pose). Furthermore, the priors can easily be
incorporated into any network and are extremely efficient in improving the
performance and accelerating the convergence speed. Firstly, a 3D face
rendering branch is set up to obtain 3D priors of salient facial structures and
identity knowledge. Secondly, the Spatial Attention Module is used to better
exploit this hierarchical information (i.e., intensity similarity, 3D facial
structure, and identity content) for the super-resolution problem. Extensive
experiments demonstrate that the proposed 3D priors achieve superior face
super-resolution results over the state-of-the-arts.Comment: Accepted as a spotlight paper, European Conference on Computer Vision
2020 (ECCV
Face Restoration via Plug-and-Play 3D Facial Priors
State-of-the-art face restoration methods employ deep convolutional neural networks (CNNs) to learn a mapping between degraded and sharp facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and only deal with task-specific face restoration (e.g.,face super-resolution or deblurring). In this paper, we propose cross-tasks and cross-models plug-and-play 3D facial priors to explicitly embed the network with the sharp facial structures for general face restoration tasks. Our 3D priors are the first to explore 3D morphable knowledge based on the fusion of parametric descriptions of face attributes (e.g., identity, facial expression, texture, illumination, and face pose). Furthermore, the priors can easily be incorporated into any network and are very efficient in improving the performance and accelerating the convergence speed. Firstly, a 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, for better exploiting this hierarchical information (i.e., intensity similarity, 3D facial structure, and identity content), a spatial attention module is designed for image restoration problems. Extensive face restoration experiments including face super-resolution and deblurring demonstrate that the proposed 3D priors achieve superior face restoration results over the state-of-the-art algorithm
Deep Learning for Image Super-resolution: A Survey
Image Super-Resolution (SR) is an important class of image processing
techniques to enhance the resolution of images and videos in computer vision.
Recent years have witnessed remarkable progress of image super-resolution using
deep learning techniques. This article aims to provide a comprehensive survey
on recent advances of image super-resolution using deep learning approaches. In
general, we can roughly group the existing studies of SR techniques into three
major categories: supervised SR, unsupervised SR, and domain-specific SR. In
addition, we also cover some other important issues, such as publicly available
benchmark datasets and performance evaluation metrics. Finally, we conclude
this survey by highlighting several future directions and open issues which
should be further addressed by the community in the future.Comment: Accepted by IEEE TPAM
PixelNN: Example-based Image Synthesis
We present a simple nearest-neighbor (NN) approach that synthesizes
high-frequency photorealistic images from an "incomplete" signal such as a
low-resolution image, a surface normal map, or edges. Current state-of-the-art
deep generative models designed for such conditional image synthesis lack two
important things: (1) they are unable to generate a large set of diverse
outputs, due to the mode collapse problem. (2) they are not interpretable,
making it difficult to control the synthesized output. We demonstrate that NN
approaches potentially address such limitations, but suffer in accuracy on
small datasets. We design a simple pipeline that combines the best of both
worlds: the first stage uses a convolutional neural network (CNN) to maps the
input to a (overly-smoothed) image, and the second stage uses a pixel-wise
nearest neighbor method to map the smoothed output to multiple high-quality,
high-frequency outputs in a controllable manner. We demonstrate our approach
for various input modalities, and for various domains ranging from human faces
to cats-and-dogs to shoes and handbags.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/pixelNN
Transform recipes for efficient cloud photo enhancement
Cloud image processing is often proposed as a solution to the limited computing power and battery life of mobile devices: it allows complex algorithms to run on powerful servers with virtually unlimited energy supply. Unfortunately, this overlooks the time and energy cost of uploading the input and downloading the output images. When transfer overhead is accounted for, processing images on a remote server becomes less attractive and many applications do not benefit from cloud offloading. We aim to change this in the case of image enhancements that preserve the overall content of an image. Our key insight is that, in this case, the server can compute and transmit a description of the transformation from input to output, which we call a transform recipe. At equivalent quality, our recipes are much more compact than JPEG images: this reduces the client's download. Furthermore, recipes can be computed from highly compressed inputs which significantly reduces the data uploaded to the server. The client reconstructs a high-fidelity approximation of the output by applying the recipe to its local high-quality input. We demonstrate our results on 168 images and 10 image processing applications, showing that our recipes form a compact representation for a diverse set of image filters. With an equivalent transmission budget, they provide higher-quality results than JPEG-compressed input/output images, with a gain of the order of 10 dB in many cases. We demonstrate the utility of recipes on a mobile phone by profiling the energy consumption and latency for both local and cloud computation: a transform recipe-based pipeline runs 2--4x faster and uses 2--7x less energy than local or naive cloud computation.Qatar Computing Research InstituteUnited States. Defense Advanced Research Projects Agency (Agreement FA8750-14-2-0009)Stanford University. Stanford Pervasive Parallelism LaboratoryAdobe System
Blind Face Restoration via Deep Multi-scale Component Dictionaries
Recent reference-based face restoration methods have received considerable
attention due to their great capability in recovering high-frequency details on
real low-quality images. However, most of these methods require a high-quality
reference image of the same identity, making them only applicable in limited
scenes. To address this issue, this paper suggests a deep face dictionary
network (termed as DFDNet) to guide the restoration process of degraded
observations. To begin with, we use K-means to generate deep dictionaries for
perceptually significant face components (\ie, left/right eyes, nose and mouth)
from high-quality images. Next, with the degraded input, we match and select
the most similar component features from their corresponding dictionaries and
transfer the high-quality details to the input via the proposed dictionary
feature transfer (DFT) block. In particular, component AdaIN is leveraged to
eliminate the style diversity between the input and dictionary features (\eg,
illumination), and a confidence score is proposed to adaptively fuse the
dictionary feature to the input. Finally, multi-scale dictionaries are adopted
in a progressive manner to enable the coarse-to-fine restoration. Experiments
show that our proposed method can achieve plausible performance in both
quantitative and qualitative evaluation, and more importantly, can generate
realistic and promising results on real degraded images without requiring an
identity-belonging reference. The source code and models are available at
\url{https://github.com/csxmli2016/DFDNet}.Comment: In ECCV 2020. Code is available at:
https://github.com/csxmli2016/DFDNe
Towards Real-World Blind Face Restoration with Generative Facial Prior
Blind face restoration usually relies on facial priors, such as facial
geometry prior or reference prior, to restore realistic and faithful details.
However, very low-quality inputs cannot offer accurate geometric prior while
high-quality references are inaccessible, limiting the applicability in
real-world scenarios. In this work, we propose GFP-GAN that leverages rich and
diverse priors encapsulated in a pretrained face GAN for blind face
restoration. This Generative Facial Prior (GFP) is incorporated into the face
restoration process via novel channel-split spatial feature transform layers,
which allow our method to achieve a good balance of realness and fidelity.
Thanks to the powerful generative facial prior and delicate designs, our
GFP-GAN could jointly restore facial details and enhance colors with just a
single forward pass, while GAN inversion methods require expensive
image-specific optimization at inference. Extensive experiments show that our
method achieves superior performance to prior art on both synthetic and
real-world datasets.Comment: CVPR 2021. Codes: https://github.com/TencentARC/GFPGA
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
- …