12,828 research outputs found
Verification of Very Low-Resolution Faces Using An Identity-Preserving Deep Face Super-Resolution Network
Face super-resolution methods usually aim at producing visually appealing
results rather than preserving distinctive features for further face
identification. In this work, we propose a deep learning method for face
verification on very low-resolution face images that involves
identity-preserving face super-resolution. Our framework includes a
super-resolution network and a feature extraction network. We train a VGG-based
deep face recognition network (Parkhi et al. 2015) to be used as feature
extractor. Our super-resolution network is trained to minimize the feature
distance between the high resolution ground truth image and the super-resolved
image, where features are extracted using our pre-trained feature extraction
network. We carry out experiments on FRGC, Multi-PIE, LFW-a, and MegaFace
datasets to evaluate our method in controlled and uncontrolled settings. The
results show that the presented method outperforms conventional
super-resolution methods in low-resolution face verification
Global-Local Face Upsampling Network
Face hallucination, which is the task of generating a high-resolution face
image from a low-resolution input image, is a well-studied problem that is
useful in widespread application areas. Face hallucination is particularly
challenging when the input face resolution is very low (e.g., 10 x 12 pixels)
and/or the image is captured in an uncontrolled setting with large pose and
illumination variations. In this paper, we revisit the algorithm introduced in
[1] and present a deep interpretation of this framework that achieves
state-of-the-art under such challenging scenarios. In our deep network
architecture the global and local constraints that define a face can be
efficiently modeled and learned end-to-end using training data. Conceptually
our network design can be partitioned into two sub-networks: the first one
implements the holistic face reconstruction according to global constraints,
and the second one enhances face-specific details and enforces local patch
statistics. We optimize the deep network using a new loss function for
super-resolution that combines reconstruction error with a learned face quality
measure in adversarial setting, producing improved visual results. We conduct
extensive experiments in both controlled and uncontrolled setups and show that
our algorithm improves the state of the art both numerically and visually
Deep Generative Adversarial Compression Artifact Removal
Compression artifacts arise in images whenever a lossy compression algorithm
is applied. These artifacts eliminate details present in the original image, or
add noise and small structures; because of these effects they make images less
pleasant for the human eye, and may also lead to decreased performance of
computer vision algorithms such as object detectors. To eliminate such
artifacts, when decompressing an image, it is required to recover the original
image from a disturbed version. To this end, we present a feed-forward fully
convolutional residual network model trained using a generative adversarial
framework. To provide a baseline, we show that our model can be also trained
optimizing the Structural Similarity (SSIM), which is a better loss with
respect to the simpler Mean Squared Error (MSE). Our GAN is able to produce
images with more photorealistic details than MSE or SSIM based networks.
Moreover we show that our approach can be used as a pre-processing step for
object detection in case images are degraded by compression to a point that
state-of-the art detectors fail. In this task, our GAN method obtains better
performance than MSE or SSIM trained networks.Comment: ICCV 2017 Camera Ready + Acknowledgement
Bridging the Gap Between Computational Photography and Visual Recognition
What is the current state-of-the-art for image restoration and enhancement
applied to degraded images acquired under less than ideal circumstances? Can
the application of such algorithms as a pre-processing step to improve image
interpretability for manual analysis or automatic visual recognition to
classify scene content? While there have been important advances in the area of
computational photography to restore or enhance the visual quality of an image,
the capabilities of such techniques have not always translated in a useful way
to visual recognition tasks. Consequently, there is a pressing need for the
development of algorithms that are designed for the joint problem of improving
visual appearance and recognition, which will be an enabling factor for the
deployment of visual recognition tools in many real-world scenarios. To address
this, we introduce the UG^2 dataset as a large-scale benchmark composed of
video imagery captured under challenging conditions, and two enhancement tasks
designed to test algorithmic impact on visual quality and automatic object
recognition. Furthermore, we propose a set of metrics to evaluate the joint
improvement of such tasks as well as individual algorithmic advances, including
a novel psychophysics-based evaluation regime for human assessment and a
realistic set of quantitative measures for object recognition performance. We
introduce six new algorithms for image restoration or enhancement, which were
created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR
2018. Under the proposed evaluation regime, we present an in-depth analysis of
these algorithms and a host of deep learning-based and classic baseline
approaches. From the observed results, it is evident that we are in the early
days of building a bridge between computational photography and visual
recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or
von Mises-Fisher Mixture Model-based Deep learning: Application to Face Verification
A number of pattern recognition tasks, \textit{e.g.}, face verification, can
be boiled down to classification or clustering of unit length directional
feature vectors whose distance can be simply computed by their angle. In this
paper, we propose the von Mises-Fisher (vMF) mixture model as the theoretical
foundation for an effective deep-learning of such directional features and
derive a novel vMF Mixture Loss and its corresponding vMF deep features. The
proposed vMF feature learning achieves the characteristics of discriminative
learning, \textit{i.e.}, compacting the instances of the same class while
increasing the distance of instances from different classes. Moreover, it
subsumes a number of popular loss functions as well as an effective method in
deep learning, namely normalization. We conduct extensive experiments on face
verification using 4 different challenging face datasets, \textit{i.e.}, LFW,
YouTube faces, CACD and IJB-A. Results show the effectiveness and excellent
generalization ability of the proposed approach as it achieves state-of-the-art
results on the LFW, YouTube faces and CACD datasets and competitive results on
the IJB-A dataset.Comment: Under revie
The Contextual Loss for Image Transformation with Non-Aligned Data
Feed-forward CNNs trained for image transformation problems rely on loss
functions that measure the similarity between the generated image and a target
image. Most of the common loss functions assume that these images are spatially
aligned and compare pixels at corresponding locations. However, for many tasks,
aligned training pairs of images will not be available. We present an
alternative loss function that does not require alignment, thus providing an
effective and simple solution for a new space of problems. Our loss is based on
both context and semantics -- it compares regions with similar semantic
meaning, while considering the context of the entire image. Hence, for example,
when transferring the style of one face to another, it will translate
eyes-to-eyes and mouth-to-mouth. Our code can be found at
https://www.github.com/roimehrez/contextualLossComment: ECCV Oral. Paper web page:
http://cgm.technion.ac.il/Computer-Graphics-Multimedia/Software/contextual
A survey of sparse representation: algorithms and applications
Sparse representation has attracted much attention from researchers in fields
of signal processing, image processing, computer vision and pattern
recognition. Sparse representation also has a good reputation in both
theoretical research and practical applications. Many different algorithms have
been proposed for sparse representation. The main purpose of this article is to
provide a comprehensive study and an updated review on sparse representation
and to supply a guidance for researchers. The taxonomy of sparse representation
methods can be studied from various viewpoints. For example, in terms of
different norm minimizations used in sparsity constraints, the methods can be
roughly categorized into five groups: sparse representation with -norm
minimization, sparse representation with -norm (0p1) minimization,
sparse representation with -norm minimization and sparse representation
with -norm minimization. In this paper, a comprehensive overview of
sparse representation is provided. The available sparse representation
algorithms can also be empirically categorized into four groups: greedy
strategy approximation, constrained optimization, proximity algorithm-based
optimization, and homotopy algorithm-based sparse representation. The
rationales of different algorithms in each category are analyzed and a wide
range of sparse representation applications are summarized, which could
sufficiently reveal the potential nature of the sparse representation theory.
Specifically, an experimentally comparative study of these sparse
representation algorithms was presented. The Matlab code used in this paper can
be available at: http://www.yongxu.org/lunwen.html.Comment: Published on IEEE Access, Vol. 3, pp. 490-530, 201
A Bayesian Nonparametric Approach to Image Super-resolution
Super-resolution methods form high-resolution images from low-resolution
images. In this paper, we develop a new Bayesian nonparametric model for
super-resolution. Our method uses a beta-Bernoulli process to learn a set of
recurring visual patterns, called dictionary elements, from the data. Because
it is nonparametric, the number of elements found is also determined from the
data. We test the results on both benchmark and natural images, comparing with
several other models from the research literature. We perform large-scale human
evaluation experiments to assess the visual quality of the results. In a first
implementation, we use Gibbs sampling to approximate the posterior. However,
this algorithm is not feasible for large-scale data. To circumvent this, we
then develop an online variational Bayes (VB) algorithm. This algorithm finds
high quality dictionaries in a fraction of the time needed by the Gibbs
sampler.Comment: 30 pages, 11 figure
Super-Resolution with Deep Convolutional Sufficient Statistics
Inverse problems in image and audio, and super-resolution in particular, can
be seen as high-dimensional structured prediction problems, where the goal is
to characterize the conditional distribution of a high-resolution output given
its low-resolution corrupted observation. When the scaling ratio is small,
point estimates achieve impressive performance, but soon they suffer from the
regression-to-the-mean problem, result of their inability to capture the
multi-modality of this conditional distribution. Modeling high-dimensional
image and audio distributions is a hard task, requiring both the ability to
model complex geometrical structures and textured regions. In this paper, we
propose to use as conditional model a Gibbs distribution, where its sufficient
statistics are given by deep convolutional neural networks. The features
computed by the network are stable to local deformation, and have reduced
variance when the input is a stationary texture. These properties imply that
the resulting sufficient statistics minimize the uncertainty of the target
signals given the degraded observations, while being highly informative. The
filters of the CNN are initialized by multiscale complex wavelets, and then we
propose an algorithm to fine-tune them by estimating the gradient of the
conditional log-likelihood, which bears some similarities with Generative
Adversarial Networks. We evaluate experimentally the proposed approach in the
image super-resolution task, but the approach is general and could be used in
other challenging ill-posed problems such as audio bandwidth extension
Attention-Aware Face Hallucination via Deep Reinforcement Learning
Face hallucination is a domain-specific super-resolution problem with the
goal to generate high-resolution (HR) faces from low-resolution (LR) input
images. In contrast to existing methods that often learn a single
patch-to-patch mapping from LR to HR images and are regardless of the
contextual interdependency between patches, we propose a novel Attention-aware
Face Hallucination (Attention-FH) framework which resorts to deep reinforcement
learning for sequentially discovering attended patches and then performing the
facial part enhancement by fully exploiting the global interdependency of the
image. Specifically, in each time step, the recurrent policy network is
proposed to dynamically specify a new attended region by incorporating what
happened in the past. The state (i.e., face hallucination result for the whole
image) can thus be exploited and updated by the local enhancement network on
the selected region. The Attention-FH approach jointly learns the recurrent
policy network and local enhancement network through maximizing the long-term
reward that reflects the hallucination performance over the whole image.
Therefore, our proposed Attention-FH is capable of adaptively personalizing an
optimal searching path for each face image according to its own characteristic.
Extensive experiments show our approach significantly surpasses the
state-of-the-arts on in-the-wild faces with large pose and illumination
variations
- …