Search CORE

21,735 research outputs found

Generative Adversarial Network-based Synthesis of Visible Faces from Polarimetric Thermal Faces

Author: Hu Shuowen
Patel Vishal M.
Riggan Benjamin S.
Zhang He
Publication venue
Publication date: 08/08/2017
Field of study

The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face recognition quite a challenging problem for both human-examiners and computer vision algorithms. Previous approaches utilize a two-step procedure (visible feature estimation and visible image reconstruction) to synthesize the visible image given the corresponding polarimetric thermal image. However, these are regarded as two disjoint steps and hence may hinder the performance of visible face reconstruction. We argue that joint optimization would be a better way to reconstruct more photo-realistic images for both computer vision algorithms and human-examiners to examine. To this end, this paper proposes a Generative Adversarial Network-based Visible Face Synthesis (GAN-VFS) method to synthesize more photo-realistic visible face images from their corresponding polarimetric images. To ensure that the encoded visible-features contain more semantically meaningful information in reconstructing the visible face image, a guidance sub-network is involved into the training procedure. To achieve photo realistic property while preserving discriminative characteristics for the reconstructed outputs, an identity loss combined with the perceptual loss are optimized in the framework. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance

arXiv.org e-Print Archive

Anisotropic Diffusion-based Kernel Matrix Model for Face Liveness Detection

Author: Jia Yunde
Yu Changyong
Publication venue
Publication date: 10/07/2017
Field of study

Facial recognition and verification is a widely used biometric technology in security system. Unfortunately, face biometrics is vulnerable to spoofing attacks using photographs or videos. In this paper, we present an anisotropic diffusion-based kernel matrix model (ADKMM) for face liveness detection to prevent face spoofing attacks. We use the anisotropic diffusion to enhance the edges and boundary locations of a face image, and the kernel matrix model to extract face image features which we call the diffusion-kernel (D-K) features. The D-K features reflect the inner correlation of the face image sequence. We introduce convolution neural networks to extract the deep features, and then, employ a generalized multiple kernel learning method to fuse the D-K features and the deep features to achieve better performance. Our experimental evaluation on the two publicly available datasets shows that the proposed method outperforms the state-of-art face liveness detection methods

arXiv.org e-Print Archive

Deep Cross Polarimetric Thermal-to-visible Face Recognition

Author: Dabouei Ali
Iranmanesh Seyed Mehdi
Kazemi Hadi
Nasrabadi Nasser M.
Publication venue
Publication date: 04/01/2018
Field of study

In this paper, we present a deep coupled learning frame- work to address the problem of matching polarimetric ther- mal face photos against a gallery of visible faces. Polariza- tion state information of thermal faces provides the miss- ing textural and geometrics details in the thermal face im- agery which exist in visible spectrum. we propose a coupled deep neural network architecture which leverages relatively large visible and thermal datasets to overcome the problem of overfitting and eventually we train it by a polarimetric thermal face dataset which is the first of its kind. The pro- posed architecture is able to make full use of the polari- metric thermal information to train a deep model compared to the conventional shallow thermal-to-visible face recogni- tion methods. Proposed coupled deep neural network also finds global discriminative features in a nonlinear embed- ding space to relate the polarimetric thermal faces to their corresponding visible faces. The results show the superior- ity of our method compared to the state-of-the-art models in cross thermal-to-visible face recognition algorithms

arXiv.org e-Print Archive

Where Is My Puppy? Retrieving Lost Dogs by Facial Features

Author: Moreira Thierry Pinheiro
Perez Mauricio Lisboa
Valle Eduardo
Werneck Rafael de Oliveira
Publication venue
Publication date: 01/08/2016
Field of study

A pet that goes missing is among many people's worst fears: a moment of distraction is enough for a dog or a cat wandering off from home. Some measures help matching lost animals to their owners; but automated visual recognition is one that - although convenient, highly available, and low-cost - is surprisingly overlooked. In this paper, we inaugurate that promising avenue by pursuing face recognition for dogs. We contrast four ready-to-use human facial recognizers (EigenFaces, FisherFaces, LBPH, and a Sparse method) to two original solutions based upon convolutional neural networks: BARK (inspired in architecture-optimized networks employed for human facial recognition) and WOOF (based upon off-the-shelf OverFeat features). Human facial recognizers perform poorly for dogs (up to 60.5% accuracy), showing that dog facial recognition is not a trivial extension of human facial recognition. The convolutional network solutions work much better, with BARK attaining up to 81.1% accuracy, and WOOF, 89.4%. The tests were conducted in two datasets: Flickr-dog, with 42 dogs of two breeds (pugs and huskies); and Snoopybook, with 18 mongrel dogs.Comment: 17 pages, 8 figures, 1 table, Multimedia Tools and Application

arXiv.org e-Print Archive

From handcrafted to deep local features

Author: Csurka Gabriela
Dance Christopher R.
Humenberger Martin
Publication venue
Publication date: 14/06/2019
Field of study

This paper presents an overview of the evolution of local features from handcrafted to deep-learning-based methods, followed by a discussion of several benchmarks and papers evaluating such local features. Our investigations are motivated by 3D reconstruction problems, where the precise location of the features is important. As we describe these methods, we highlight and explain the challenges of feature extraction and potential ways to overcome them. We first present handcrafted methods, followed by methods based on classical machine learning and finally we discuss methods based on deep-learning. This largely chronologically-ordered presentation will help the reader to fully understand the topic of image and region description in order to make best use of it in modern computer vision applications. In particular, understanding handcrafted methods and their motivation can help to understand modern approaches and how machine learning is used to improve the results. We also provide references to most of the relevant literature and code.Comment: Preprin

arXiv.org e-Print Archive

Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

Author: He Ran
Sun Zhenan
Tan Tieniu
Wu Xiang
Publication venue
Publication date: 08/08/2017
Field of study

Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods

arXiv.org e-Print Archive

Adversarial Discriminative Heterogeneous Face Recognition

Author: He Ran
Song Lingxiao
Wu Xiang
Zhang Man
Publication venue
Publication date: 11/09/2017
Field of study

The gap between sensing patterns of different face modalities remains a challenging problem in heterogeneous face recognition (HFR). This paper proposes an adversarial discriminative feature learning framework to close the sensing gap via adversarial learning on both raw-pixel space and compact feature space. This framework integrates cross-spectral face hallucination and discriminative feature learning into an end-to-end adversarial network. In the pixel space, we make use of generative adversarial networks to perform cross-spectral face hallucination. An elaborate two-path model is introduced to alleviate the lack of paired images, which gives consideration to both global structures and local textures. In the feature space, an adversarial loss and a high-order variance discrepancy loss are employed to measure the global and local discrepancy between two heterogeneous distributions respectively. These two losses enhance domain-invariant feature learning and modality independent noise removing. Experimental results on three NIR-VIS databases show that our proposed approach outperforms state-of-the-art HFR methods, without requiring of complex network or large-scale training dataset

arXiv.org e-Print Archive

Coupled Deep Learning for Heterogeneous Face Recognition

Author: He Ran
Song Lingxiao
Tan Tieniu
Wu Xiang
Publication venue
Publication date: 16/11/2017
Field of study

Heterogeneous face matching is a challenge issue in face recognition due to large domain difference as well as insufficient pairwise images in different modalities during training. This paper proposes a coupled deep learning (CDL) approach for the heterogeneous face matching. CDL seeks a shared feature space in which the heterogeneous face matching problem can be approximately treated as a homogeneous face matching problem. The objective function of CDL mainly includes two parts. The first part contains a trace norm and a block-diagonal prior as relevance constraints, which not only make unpaired images from multiple modalities be clustered and correlated, but also regularize the parameters to alleviate overfitting. An approximate variational formulation is introduced to deal with the difficulties of optimizing low-rank constraint directly. The second part contains a cross modal ranking among triplet domain specific images to maximize the margin for different identities and increase data for a small amount of training samples. Besides, an alternating minimization method is employed to iteratively update the parameters of CDL. Experimental results show that CDL achieves better performance on the challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF), which significantly outperforms state-of-the-art heterogeneous face recognition methods.Comment: AAAI 201

arXiv.org e-Print Archive

ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks

Author: Bourdev Lubomir
Collobert Ronan
Nevatia Ram
Paluri Manohar
Sun Chen
Publication venue
Publication date: 12/04/2016
Field of study

This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations. It is challenging as objects in the wild could appear at arbitrary locations and in different scales. In this paper, we propose a novel classification architecture ProNet based on convolutional neural networks. It uses computationally efficient neural networks to propose image regions that are likely to contain objects, and applies more powerful but slower networks on the proposed regions. The basic building block is a multi-scale fully-convolutional network which assigns object confidence scores to boxes at different locations and scales. We show that such networks can be trained effectively using image-level annotations, and can be connected into cascades or trees for efficient object classification. ProNet outperforms previous state-of-the-art significantly on PASCAL VOC 2012 and MS COCO datasets for object classification and point-based localization.Comment: CVPR 2016 (fixed reference issue

arXiv.org e-Print Archive

Directional Statistics-based Deep Metric Learning for Image Classification and Retrieval

Author: Chen Shifeng
Yan Hong
Zhe Xuefei
Publication venue
Publication date: 27/03/2018
Field of study

Deep distance metric learning (DDML), which is proposed to learn image similarity metrics in an end-to-end manner based on the convolution neural network, has achieved encouraging results in many computer vision tasks.

L2

-normalization in the embedding space has been used to improve the performance of several DDML methods. However, the commonly used Euclidean distance is no longer an accurate metric for

L2

-normalized embedding space, i.e., a hyper-sphere. Another challenge of current DDML methods is that their loss functions are usually based on rigid data formats, such as the triplet tuple. Thus, an extra process is needed to prepare data in specific formats. In addition, their losses are obtained from a limited number of samples, which leads to a lack of the global view of the embedding space. In this paper, we replace the Euclidean distance with the cosine similarity to better utilize the

L2

-normalization, which is able to attenuate the curse of dimensionality. More specifically, a novel loss function based on the von Mises-Fisher distribution is proposed to learn a compact hyper-spherical embedding space. Moreover, a new efficient learning algorithm is developed to better capture the global structure of the embedding space. Experiments for both classification and retrieval tasks on several standard datasets show that our method achieves state-of-the-art performance with a simpler training procedure. Furthermore, we demonstrate that, even with a small number of convolutional layers, our model can still obtain significantly better classification performance than the widely used softmax loss.Comment: codes will come soo

arXiv.org e-Print Archive