103,616 research outputs found
Coupled Deep Learning for Heterogeneous Face Recognition
Heterogeneous face matching is a challenge issue in face recognition due to
large domain difference as well as insufficient pairwise images in different
modalities during training. This paper proposes a coupled deep learning (CDL)
approach for the heterogeneous face matching. CDL seeks a shared feature space
in which the heterogeneous face matching problem can be approximately treated
as a homogeneous face matching problem. The objective function of CDL mainly
includes two parts. The first part contains a trace norm and a block-diagonal
prior as relevance constraints, which not only make unpaired images from
multiple modalities be clustered and correlated, but also regularize the
parameters to alleviate overfitting. An approximate variational formulation is
introduced to deal with the difficulties of optimizing low-rank constraint
directly. The second part contains a cross modal ranking among triplet domain
specific images to maximize the margin for different identities and increase
data for a small amount of training samples. Besides, an alternating
minimization method is employed to iteratively update the parameters of CDL.
Experimental results show that CDL achieves better performance on the
challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch
database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF),
which significantly outperforms state-of-the-art heterogeneous face recognition
methods.Comment: AAAI 201
Face Recognition: A Novel Multi-Level Taxonomy based Survey
In a world where security issues have been gaining growing importance, face
recognition systems have attracted increasing attention in multiple application
areas, ranging from forensics and surveillance to commerce and entertainment.
To help understanding the landscape and abstraction levels relevant for face
recognition systems, face recognition taxonomies allow a deeper dissection and
comparison of the existing solutions. This paper proposes a new, more
encompassing and richer multi-level face recognition taxonomy, facilitating the
organization and categorization of available and emerging face recognition
solutions; this taxonomy may also guide researchers in the development of more
efficient face recognition solutions. The proposed multi-level taxonomy
considers levels related to the face structure, feature support and feature
extraction approach. Following the proposed taxonomy, a comprehensive survey of
representative face recognition solutions is presented. The paper concludes
with a discussion on current algorithmic and application related challenges
which may define future research directions for face recognition.Comment: This paper is a preprint of a paper submitted to IET Biometrics. If
accepted, the copy of record will be available at the IET Digital Librar
Shared Representation Learning for Heterogeneous Face Recognition
After intensive research, heterogenous face recognition is still a
challenging problem. The main difficulties are owing to the complex
relationship between heterogenous face image spaces. The heterogeneity is
always tightly coupled with other variations, which makes the relationship of
heterogenous face images highly nonlinear. Many excellent methods have been
proposed to model the nonlinear relationship, but they apt to overfit to the
training set, due to limited samples. Inspired by the unsupervised algorithms
in deep learning, this paper proposes an novel framework for heterogeneous face
recognition. We first extract Gabor features at some localized facial points,
and then use Restricted Boltzmann Machines (RBMs) to learn a shared
representation locally to remove the heterogeneity around each facial point.
Finally, the shared representations of local RBMs are connected together and
processed by PCA. Two problems (Sketch-Photo and NIR-VIS) and three databases
are selected to evaluate the proposed method. For Sketch-Photo problem, we
obtain perfect results on the CUFS database. For NIR-VIS problem, we produce
new state-of-the-art performance on the CASIA HFB and NIR-VIS 2.0 databases
Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval
Sketch-based image retrieval (SBIR) is challenging due to the inherent
domain-gap between sketch and photo. Compared with pixel-perfect depictions of
photos, sketches are iconic renderings of the real world with highly abstract.
Therefore, matching sketch and photo directly using low-level visual clues are
unsufficient, since a common low-level subspace that traverses semantically
across the two modalities is non-trivial to establish. Most existing SBIR
studies do not directly tackle this cross-modal problem. This naturally
motivates us to explore the effectiveness of cross-modal retrieval methods in
SBIR, which have been applied in the image-text matching successfully. In this
paper, we introduce and compare a series of state-of-the-art cross-modal
subspace learning methods and benchmark them on two recently released
fine-grained SBIR datasets. Through thorough examination of the experimental
results, we have demonstrated that the subspace learning can effectively model
the sketch-photo domain-gap. In addition we draw a few key insights to drive
future research.Comment: Accepted by Neurocomputin
Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition
Face recognition has witnessed great progress in recent years, mainly
attributed to the high-capacity model designed and the abundant labeled data
collected. However, it becomes more and more prohibitive to scale up the
current million-level identity annotations. In this work, we show that
unlabeled face data can be as effective as the labeled ones. Here, we consider
a setting closely mimicking the real-world scenario, where the unlabeled data
are collected from unconstrained environments and their identities are
exclusive from the labeled ones. Our main insight is that although the class
information is not available, we can still faithfully approximate these
semantic relationships by constructing a relational graph in a bottom-up
manner. We propose Consensus-Driven Propagation (CDP) to tackle this
challenging problem with two modules, the "committee" and the "mediator", which
select positive face pairs robustly by carefully aggregating multi-view
information. Extensive experiments validate the effectiveness of both modules
to discard outliers and mine hard positives. With CDP, we achieve a compelling
accuracy of 78.18% on MegaFace identification challenge by using only 9% of the
labels, comparing to 61.78% when no unlabeled data are used and 78.52% when all
labels are employed.Comment: In ECCV 2018. More details at the project page:
http://mmlab.ie.cuhk.edu.hk/projects/CDP
A Light CNN for Deep Face Representation with Noisy Labels
The volume of convolutional neural network (CNN) models proposed for face
recognition has been continuously growing larger to better fit large amount of
training data. When training data are obtained from internet, the labels are
likely to be ambiguous and inaccurate. This paper presents a Light CNN
framework to learn a compact embedding on the large-scale face data with
massive noisy labels. First, we introduce a variation of maxout activation,
called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different
from maxout activation that uses many feature maps to linearly approximate an
arbitrary convex activation function, MFM does so via a competitive
relationship. MFM can not only separate noisy and informative signals but also
play the role of feature selection between two feature maps. Second, three
networks are carefully designed to obtain better performance meanwhile reducing
the number of parameters and computational costs. Lastly, a semantic
bootstrapping method is proposed to make the prediction of the networks more
consistent with noisy labels. Experimental results show that the proposed
framework can utilize large-scale noisy data to learn a Light model that is
efficient in computational costs and storage spaces. The learned single network
with a 256-D representation achieves state-of-the-art results on various face
benchmarks without fine-tuning. The code is released on
https://github.com/AlfredXiangWu/LightCNN.Comment: arXiv admin note: text overlap with arXiv:1507.04844. The models are
released on https://github.com/AlfredXiangWu/LightCNN, IEEE Transactions on
Information Forensics and Security, 201
A Comprehensive Survey on Cross-modal Retrieval
In recent years, cross-modal retrieval has drawn much attention due to the
rapid growth of multimodal data. It takes one type of data as the query to
retrieve relevant data of another type. For example, a user can use a text to
retrieve relevant pictures or videos. Since the query and its retrieved results
can be of different modalities, how to measure the content similarity between
different modalities of data remains a challenge. Various methods have been
proposed to deal with such a problem. In this paper, we first review a number
of representative methods for cross-modal retrieval and classify them into two
main groups: 1) real-valued representation learning, and 2) binary
representation learning. Real-valued representation learning methods aim to
learn real-valued common representations for different modalities of data. To
speed up the cross-modal retrieval, a number of binary representation learning
methods are proposed to map different modalities of data into a common Hamming
space. Then, we introduce several multimodal datasets in the community, and
show the experimental results on two commonly used multimodal datasets. The
comparison reveals the characteristic of different kinds of cross-modal
retrieval methods, which is expected to benefit both practical applications and
future research. Finally, we discuss open problems and future research
directions.Comment: 20 pages, 11 figures, 9 table
Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition
Heterogeneous face recognition (HFR) aims to match facial images acquired
from different sensing modalities with mission-critical applications in
forensics, security and commercial sectors. However, HFR is a much more
challenging problem than traditional face recognition because of large
intra-class variations of heterogeneous face images and limited training
samples of cross-modality face image pairs. This paper proposes a novel
approach namely Wasserstein CNN (convolutional neural networks, or WCNN for
short) to learn invariant features between near-infrared and visual face images
(i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with
widely available face images in visual spectrum. The high-level layer is
divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer.
The first two layers aims to learn modality-specific features and NIR-VIS
shared layer is designed to learn modality-invariant feature subspace.
Wasserstein distance is introduced into NIR-VIS shared layer to measure the
dissimilarity between heterogeneous feature distributions. So W-CNN learning
aims to achieve the minimization of Wasserstein distance between NIR
distribution and VIS distribution for invariant deep feature representation of
heterogeneous face images. To avoid the over-fitting problem on small-scale
heterogeneous face data, a correlation prior is introduced on the
fully-connected layers of WCNN network to reduce parameter space. This prior is
implemented by a low-rank constraint in an end-to-end network. The joint
formulation leads to an alternating minimization for deep feature
representation at training stage and an efficient computation for heterogeneous
data at testing stage. Extensive experiments on three challenging NIR-VIS face
recognition databases demonstrate the significant superiority of Wasserstein
CNN over state-of-the-art methods
Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
A novel deep neural network training paradigm that exploits the conjoint
information in multiple heterogeneous sources is proposed. Specifically, in a
RGB-D based action recognition task, it cooperatively trains a single
convolutional neural network (named c-ConvNet) on both RGB visual features and
depth features, and deeply aggregates the two kinds of features for action
recognition. Differently from the conventional ConvNet that learns the deep
separable features for homogeneous modality-based classification with only one
softmax loss function, the c-ConvNet enhances the discriminative power of the
deeply learned features and weakens the undesired modality discrepancy by
jointly optimizing a ranking loss and a softmax loss for both homogeneous and
heterogeneous modalities. The ranking loss consists of intra-modality and
cross-modality triplet losses, and it reduces both the intra-modality and
cross-modality feature variations. Furthermore, the correlations between RGB
and depth data are embedded in the c-ConvNet, and can be retrieved by either of
the modalities and contribute to the recognition in the case even only one of
the modalities is available. The proposed method was extensively evaluated on
two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D
datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art
results
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
- …