21,735 research outputs found
Generative Adversarial Network-based Synthesis of Visible Faces from Polarimetric Thermal Faces
The large domain discrepancy between faces captured in polarimetric (or
conventional) thermal and visible domain makes cross-domain face recognition
quite a challenging problem for both human-examiners and computer vision
algorithms. Previous approaches utilize a two-step procedure (visible feature
estimation and visible image reconstruction) to synthesize the visible image
given the corresponding polarimetric thermal image. However, these are regarded
as two disjoint steps and hence may hinder the performance of visible face
reconstruction. We argue that joint optimization would be a better way to
reconstruct more photo-realistic images for both computer vision algorithms and
human-examiners to examine. To this end, this paper proposes a Generative
Adversarial Network-based Visible Face Synthesis (GAN-VFS) method to synthesize
more photo-realistic visible face images from their corresponding polarimetric
images. To ensure that the encoded visible-features contain more semantically
meaningful information in reconstructing the visible face image, a guidance
sub-network is involved into the training procedure. To achieve photo realistic
property while preserving discriminative characteristics for the reconstructed
outputs, an identity loss combined with the perceptual loss are optimized in
the framework. Multiple experiments evaluated on different experimental
protocols demonstrate that the proposed method achieves state-of-the-art
performance
Anisotropic Diffusion-based Kernel Matrix Model for Face Liveness Detection
Facial recognition and verification is a widely used biometric technology in
security system. Unfortunately, face biometrics is vulnerable to spoofing
attacks using photographs or videos. In this paper, we present an anisotropic
diffusion-based kernel matrix model (ADKMM) for face liveness detection to
prevent face spoofing attacks. We use the anisotropic diffusion to enhance the
edges and boundary locations of a face image, and the kernel matrix model to
extract face image features which we call the diffusion-kernel (D-K) features.
The D-K features reflect the inner correlation of the face image sequence. We
introduce convolution neural networks to extract the deep features, and then,
employ a generalized multiple kernel learning method to fuse the D-K features
and the deep features to achieve better performance. Our experimental
evaluation on the two publicly available datasets shows that the proposed
method outperforms the state-of-art face liveness detection methods
Deep Cross Polarimetric Thermal-to-visible Face Recognition
In this paper, we present a deep coupled learning frame- work to address the
problem of matching polarimetric ther- mal face photos against a gallery of
visible faces. Polariza- tion state information of thermal faces provides the
miss- ing textural and geometrics details in the thermal face im- agery which
exist in visible spectrum. we propose a coupled deep neural network
architecture which leverages relatively large visible and thermal datasets to
overcome the problem of overfitting and eventually we train it by a
polarimetric thermal face dataset which is the first of its kind. The pro-
posed architecture is able to make full use of the polari- metric thermal
information to train a deep model compared to the conventional shallow
thermal-to-visible face recogni- tion methods. Proposed coupled deep neural
network also finds global discriminative features in a nonlinear embed- ding
space to relate the polarimetric thermal faces to their corresponding visible
faces. The results show the superior- ity of our method compared to the
state-of-the-art models in cross thermal-to-visible face recognition
algorithms
Where Is My Puppy? Retrieving Lost Dogs by Facial Features
A pet that goes missing is among many people's worst fears: a moment of
distraction is enough for a dog or a cat wandering off from home. Some measures
help matching lost animals to their owners; but automated visual recognition is
one that - although convenient, highly available, and low-cost - is
surprisingly overlooked. In this paper, we inaugurate that promising avenue by
pursuing face recognition for dogs. We contrast four ready-to-use human facial
recognizers (EigenFaces, FisherFaces, LBPH, and a Sparse method) to two
original solutions based upon convolutional neural networks: BARK (inspired in
architecture-optimized networks employed for human facial recognition) and WOOF
(based upon off-the-shelf OverFeat features). Human facial recognizers perform
poorly for dogs (up to 60.5% accuracy), showing that dog facial recognition is
not a trivial extension of human facial recognition. The convolutional network
solutions work much better, with BARK attaining up to 81.1% accuracy, and WOOF,
89.4%. The tests were conducted in two datasets: Flickr-dog, with 42 dogs of
two breeds (pugs and huskies); and Snoopybook, with 18 mongrel dogs.Comment: 17 pages, 8 figures, 1 table, Multimedia Tools and Application
From handcrafted to deep local features
This paper presents an overview of the evolution of local features from
handcrafted to deep-learning-based methods, followed by a discussion of several
benchmarks and papers evaluating such local features. Our investigations are
motivated by 3D reconstruction problems, where the precise location of the
features is important. As we describe these methods, we highlight and explain
the challenges of feature extraction and potential ways to overcome them. We
first present handcrafted methods, followed by methods based on classical
machine learning and finally we discuss methods based on deep-learning. This
largely chronologically-ordered presentation will help the reader to fully
understand the topic of image and region description in order to make best use
of it in modern computer vision applications. In particular, understanding
handcrafted methods and their motivation can help to understand modern
approaches and how machine learning is used to improve the results. We also
provide references to most of the relevant literature and code.Comment: Preprin
Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition
Heterogeneous face recognition (HFR) aims to match facial images acquired
from different sensing modalities with mission-critical applications in
forensics, security and commercial sectors. However, HFR is a much more
challenging problem than traditional face recognition because of large
intra-class variations of heterogeneous face images and limited training
samples of cross-modality face image pairs. This paper proposes a novel
approach namely Wasserstein CNN (convolutional neural networks, or WCNN for
short) to learn invariant features between near-infrared and visual face images
(i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with
widely available face images in visual spectrum. The high-level layer is
divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer.
The first two layers aims to learn modality-specific features and NIR-VIS
shared layer is designed to learn modality-invariant feature subspace.
Wasserstein distance is introduced into NIR-VIS shared layer to measure the
dissimilarity between heterogeneous feature distributions. So W-CNN learning
aims to achieve the minimization of Wasserstein distance between NIR
distribution and VIS distribution for invariant deep feature representation of
heterogeneous face images. To avoid the over-fitting problem on small-scale
heterogeneous face data, a correlation prior is introduced on the
fully-connected layers of WCNN network to reduce parameter space. This prior is
implemented by a low-rank constraint in an end-to-end network. The joint
formulation leads to an alternating minimization for deep feature
representation at training stage and an efficient computation for heterogeneous
data at testing stage. Extensive experiments on three challenging NIR-VIS face
recognition databases demonstrate the significant superiority of Wasserstein
CNN over state-of-the-art methods
Adversarial Discriminative Heterogeneous Face Recognition
The gap between sensing patterns of different face modalities remains a
challenging problem in heterogeneous face recognition (HFR). This paper
proposes an adversarial discriminative feature learning framework to close the
sensing gap via adversarial learning on both raw-pixel space and compact
feature space. This framework integrates cross-spectral face hallucination and
discriminative feature learning into an end-to-end adversarial network. In the
pixel space, we make use of generative adversarial networks to perform
cross-spectral face hallucination. An elaborate two-path model is introduced to
alleviate the lack of paired images, which gives consideration to both global
structures and local textures. In the feature space, an adversarial loss and a
high-order variance discrepancy loss are employed to measure the global and
local discrepancy between two heterogeneous distributions respectively. These
two losses enhance domain-invariant feature learning and modality independent
noise removing. Experimental results on three NIR-VIS databases show that our
proposed approach outperforms state-of-the-art HFR methods, without requiring
of complex network or large-scale training dataset
Coupled Deep Learning for Heterogeneous Face Recognition
Heterogeneous face matching is a challenge issue in face recognition due to
large domain difference as well as insufficient pairwise images in different
modalities during training. This paper proposes a coupled deep learning (CDL)
approach for the heterogeneous face matching. CDL seeks a shared feature space
in which the heterogeneous face matching problem can be approximately treated
as a homogeneous face matching problem. The objective function of CDL mainly
includes two parts. The first part contains a trace norm and a block-diagonal
prior as relevance constraints, which not only make unpaired images from
multiple modalities be clustered and correlated, but also regularize the
parameters to alleviate overfitting. An approximate variational formulation is
introduced to deal with the difficulties of optimizing low-rank constraint
directly. The second part contains a cross modal ranking among triplet domain
specific images to maximize the margin for different identities and increase
data for a small amount of training samples. Besides, an alternating
minimization method is employed to iteratively update the parameters of CDL.
Experimental results show that CDL achieves better performance on the
challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch
database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF),
which significantly outperforms state-of-the-art heterogeneous face recognition
methods.Comment: AAAI 201
ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks
This paper aims to classify and locate objects accurately and efficiently,
without using bounding box annotations. It is challenging as objects in the
wild could appear at arbitrary locations and in different scales. In this
paper, we propose a novel classification architecture ProNet based on
convolutional neural networks. It uses computationally efficient neural
networks to propose image regions that are likely to contain objects, and
applies more powerful but slower networks on the proposed regions. The basic
building block is a multi-scale fully-convolutional network which assigns
object confidence scores to boxes at different locations and scales. We show
that such networks can be trained effectively using image-level annotations,
and can be connected into cascades or trees for efficient object
classification. ProNet outperforms previous state-of-the-art significantly on
PASCAL VOC 2012 and MS COCO datasets for object classification and point-based
localization.Comment: CVPR 2016 (fixed reference issue
Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval
Deep distance metric learning (DDML), which is proposed to learn image
similarity metrics in an end-to-end manner based on the convolution neural
network, has achieved encouraging results in many computer vision
tasks.-normalization in the embedding space has been used to improve the
performance of several DDML methods. However, the commonly used Euclidean
distance is no longer an accurate metric for -normalized embedding space,
i.e., a hyper-sphere. Another challenge of current DDML methods is that their
loss functions are usually based on rigid data formats, such as the triplet
tuple. Thus, an extra process is needed to prepare data in specific formats. In
addition, their losses are obtained from a limited number of samples, which
leads to a lack of the global view of the embedding space. In this paper, we
replace the Euclidean distance with the cosine similarity to better utilize the
-normalization, which is able to attenuate the curse of dimensionality.
More specifically, a novel loss function based on the von Mises-Fisher
distribution is proposed to learn a compact hyper-spherical embedding space.
Moreover, a new efficient learning algorithm is developed to better capture the
global structure of the embedding space. Experiments for both classification
and retrieval tasks on several standard datasets show that our method achieves
state-of-the-art performance with a simpler training procedure. Furthermore, we
demonstrate that, even with a small number of convolutional layers, our model
can still obtain significantly better classification performance than the
widely used softmax loss.Comment: codes will come soo
- …