418 research outputs found
Robust RGB-D Face Recognition Using Attribute-Aware Loss
Existing convolutional neural network (CNN) based face recognition algorithms
typically learn a discriminative feature mapping, using a loss function that
enforces separation of features from different classes and/or aggregation of
features within the same class. However, they may suffer from bias in the
training data such as uneven sampling density, because they optimize the
adjacency relationship of the learned features without considering the
proximity of the underlying faces. Moreover, since they only use facial images
for training, the learned feature mapping may not correctly indicate the
relationship of other attributes such as gender and ethnicity, which can be
important for some face recognition applications. In this paper, we propose a
new CNN-based face recognition approach that incorporates such attributes into
the training process. Using an attribute-aware loss function that regularizes
the feature mapping using attribute proximity, our approach learns more
discriminative features that are correlated with the attributes. We train our
face recognition model on a large-scale RGB-D data set with over 100K
identities captured under real application conditions. By comparing our
approach with other methods on a variety of experiments, we demonstrate that
depth channel and attribute-aware loss greatly improve the accuracy and
robustness of face recognition
Deep Sketch-Photo Face Recognition Assisted by Facial Attributes
In this paper, we present a deep coupled framework to address the problem of
matching sketch image against a gallery of mugshots. Face sketches have the
essential in- formation about the spatial topology and geometric details of
faces while missing some important facial attributes such as ethnicity, hair,
eye, and skin color. We propose a cou- pled deep neural network architecture
which utilizes facial attributes in order to improve the sketch-photo
recognition performance. The proposed Attribute-Assisted Deep Con- volutional
Neural Network (AADCNN) method exploits the facial attributes and leverages the
loss functions from the facial attributes identification and face verification
tasks in order to learn rich discriminative features in a common em- bedding
subspace. The facial attribute identification task increases the inter-personal
variations by pushing apart the embedded features extracted from individuals
with differ- ent facial attributes, while the verification task reduces the
intra-personal variations by pulling together all the fea- tures that are
related to one person. The learned discrim- inative features can be well
generalized to new identities not seen in the training data. The proposed
architecture is able to make full use of the sketch and complementary fa- cial
attribute information to train a deep model compared to the conventional
sketch-photo recognition methods. Exten- sive experiments are performed on
composite (E-PRIP) and semi-forensic (IIIT-D semi-forensic) datasets. The
results show the superiority of our method compared to the state- of-the-art
models in sketch-photo recognition algorithm
Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification
Re-identification is generally carried out by encoding the appearance of a
subject in terms of outfit, suggesting scenarios where people do not change
their attire. In this paper we overcome this restriction, by proposing a
framework based on a deep convolutional neural network, SOMAnet, that
additionally models other discriminative aspects, namely, structural attributes
of the human figure (e.g. height, obesity, gender). Our method is unique in
many respects. First, SOMAnet is based on the Inception architecture, departing
from the usual siamese framework. This spares expensive data preparation
(pairing images across cameras) and allows the understanding of what the
network learned. Second, and most notably, the training data consists of a
synthetic 100K instance dataset, SOMAset, created by photorealistic human body
generation software. Synthetic data represents a good compromise between
realistic imagery, usually not required in re-identification since surveillance
cameras capture low-resolution silhouettes, and complete control of the
samples, which is useful in order to customize the data w.r.t. the surveillance
scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on
recent re-identification benchmarks, outperforms all competitors, matching
subjects even with different apparel. The combination of synthetic data with
Inception architectures opens up new research avenues in re-identification.Comment: 14 page
Seeing Voices and Hearing Faces: Cross-modal biometric matching
We introduce a seemingly impossible task: given only an audio clip of someone
speaking, decide which of two face images is the speaker. In this paper we
study this, and a number of related cross-modal tasks, aimed at answering the
question: how much can we infer from the voice about the face and vice versa?
We study this task "in the wild", employing the datasets that are now publicly
available for face recognition from static images (VGGFace) and speaker
identification from audio (VoxCeleb). These provide training and testing
scenarios for both static and dynamic testing of cross-modal matching. We make
the following contributions: (i) we introduce CNN architectures for both binary
and multi-way cross-modal face and audio matching, (ii) we compare dynamic
testing (where video information is available, but the audio is not from the
same video) with static testing (where only a single still image is available),
and (iii) we use human testing as a baseline to calibrate the difficulty of the
task. We show that a CNN can indeed be trained to solve this task in both the
static and dynamic scenarios, and is even well above chance on 10-way
classification of the face given the voice. The CNN matches human performance
on easy examples (e.g. different gender across faces) but exceeds human
performance on more challenging examples (e.g. faces with the same gender, age
and nationality).Comment: To appear in: IEEE Computer Vision and Pattern Recognition (CVPR),
201
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
Deep Learning Architectures for Heterogeneous Face Recognition
Face recognition has been one of the most challenging areas of research in biometrics and computer vision. Many face recognition algorithms are designed to address illumination and pose problems for visible face images. In recent years, there has been significant amount of research in Heterogeneous Face Recognition (HFR). The large modality gap between faces captured in different spectrum as well as lack of training data makes heterogeneous face recognition (HFR) quite a challenging problem. In this work, we present different deep learning frameworks to address the problem of matching non-visible face photos against a gallery of visible faces.
Algorithms for thermal-to-visible face recognition can be categorized as cross-spectrum feature-based methods, or cross-spectrum image synthesis methods. In cross-spectrum feature-based face recognition a thermal probe is matched against a gallery of visible faces corresponding to the real-world scenario, in a feature subspace. The second category synthesizes a visible-like image from a thermal image which can then be used by any commercial visible spectrum face recognition system. These methods also beneficial in the sense that the synthesized visible face image can be directly utilized by existing face recognition systems which operate only on the visible face imagery. Therefore, using this approach one can leverage the existing commercial-off-the-shelf (COTS) and government-off-the-shelf (GOTS) solutions. In addition, the synthesized images can be used by human examiners for different purposes.
There are some informative traits, such as age, gender, ethnicity, race, and hair color, which are not distinctive enough for the sake of recognition, but still can act as complementary information to other primary information, such as face and fingerprint. These traits, which are known as soft biometrics, can improve recognition algorithms while they are much cheaper and faster to acquire. They can be directly used in a unimodal system for some applications. Usually, soft biometric traits have been utilized jointly with hard biometrics (face photo) for different tasks in the sense that they are considered to be available both during the training and testing phases. In our approaches we look at this problem in a different way. We consider the case when soft biometric information does not exist during the testing phase, and our method can predict them directly in a multi-tasking paradigm.
There are situations in which training data might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on deep learning techniques that leverages the auxiliary view to improve the performance of recognition system. We do so by introducing a formulation that is general, in the sense that can be used with any visual classifier.
Every use of auxiliary information has been validated extensively using publicly available benchmark datasets, and several new state-of-the-art accuracy performance values have been set. Examples of application domains include visual object recognition from RGB images and from depth data, handwritten digit recognition, and gesture recognition from video.
We also design a novel aggregation framework which optimizes the landmark locations directly using only one image without requiring any extra prior which leads to robust alignment given arbitrary face deformations. Three different approaches are employed to generate the manipulated faces and two of them perform the manipulation via the adversarial attacks to fool a face recognizer. This step can decouple from our framework and potentially used to enhance other landmark detectors. Aggregation of the manipulated faces in different branches of proposed method leads to robust landmark detection.
Finally we focus on the generative adversarial networks which is a very powerful tool in synthesizing a visible-like images from the non-visible images. The main goal of a generative model is to approximate the true data distribution which is not known. In general, the choice for modeling the density function is challenging. Explicit models have the advantage of explicitly calculating the probability densities. There are two well-known implicit approaches, namely the Generative Adversarial Network (GAN) and Variational AutoEncoder (VAE) which try to model the data distribution implicitly. The VAEs try to maximize the data likelihood lower bound, while a GAN performs a minimax game between two players during its optimization. GANs overlook the explicit data density characteristics which leads to undesirable quantitative evaluations and mode collapse. This causes the generator to create similar looking images with poor diversity of samples. In the last chapter of thesis, we focus to address this issue in GANs framework
From clothing to identity; manual and automatic soft biometrics
Soft biometrics have increasingly attracted research interest and are often considered as major cues for identity, especially in the absence of valid traditional biometrics, as in surveillance. In everyday life, several incidents and forensic scenarios highlight the usefulness and capability of identity information that can be deduced from clothing. Semantic clothing attributes have recently been introduced as a new form of soft biometrics. Although clothing traits can be naturally described and compared by humans for operable and successful use, it is desirable to exploit computer-vision to enrich clothing descriptions with more objective and discriminative information. This allows automatic extraction and semantic description and comparison of visually detectable clothing traits in a manner similar to recognition by eyewitness statements. This study proposes a novel set of soft clothing attributes, described using small groups of high-level semantic labels, and automatically extracted using computer-vision techniques. In this way we can explore the capability of human attributes vis-a-vis those which are inferred automatically by computer-vision. Categorical and comparative soft clothing traits are derived and used for identification/re identification either to supplement soft body traits or to be used alone. The automatically- and manually-derived soft clothing biometrics are employed in challenging invariant person retrieval. The experimental results highlight promising potential for use in various applications
- …