33,175 research outputs found
GhostVLAD for set-based face recognition
The objective of this paper is to learn a compact representation of image
sets for template-based face recognition. We make the following contributions:
first, we propose a network architecture which aggregates and embeds the face
descriptors produced by deep convolutional neural networks into a compact
fixed-length representation. This compact representation requires minimal
memory storage and enables efficient similarity computation. Second, we propose
a novel GhostVLAD layer that includes {\em ghost clusters}, that do not
contribute to the aggregation. We show that a quality weighting on the input
faces emerges automatically such that informative images contribute more than
those with low quality, and that the ghost clusters enhance the network's
ability to deal with poor quality images. Third, we explore how input feature
dimension, number of clusters and different training techniques affect the
recognition performance. Given this analysis, we train a network that far
exceeds the state-of-the-art on the IJB-B face recognition dataset. This is
currently one of the most challenging public benchmarks, and we surpass the
state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201
Face recognition technologies for evidential evaluation of video traces
Human recognition from video traces is an important task in forensic investigations and evidence evaluations. Compared with other biometric traits, face is one of the most popularly used modalities for human recognition due to the fact that its collection is non-intrusive and requires less cooperation from the subjects. Moreover, face images taken at a long distance can still provide reasonable resolution, while most biometric modalities, such as iris and fingerprint, do not have this merit. In this chapter, we discuss automatic face recognition technologies for evidential evaluations of video traces. We first introduce the general concepts in both forensic and automatic face recognition , then analyse the difficulties in face recognition from videos . We summarise and categorise the approaches for handling different uncontrollable factors in difficult recognition conditions. Finally we discuss some challenges and trends in face recognition research in both forensics and biometrics . Given its merits tested in many deployed systems and great potential in other emerging applications, considerable research and development efforts are expected to be devoted in face recognition in the near future
Robust Recovery of Subspace Structures by Low-Rank Representation
In this work we address the subspace recovery problem. Given a set of data
samples (vectors) approximately drawn from a union of multiple subspaces, our
goal is to segment the samples into their respective subspaces and correct the
possible errors as well. To this end, we propose a novel method termed Low-Rank
Representation (LRR), which seeks the lowest-rank representation among all the
candidates that can represent the data samples as linear combinations of the
bases in a given dictionary. It is shown that LRR well solves the subspace
recovery problem: when the data is clean, we prove that LRR exactly captures
the true subspace structures; for the data contaminated by outliers, we prove
that under certain conditions LRR can exactly recover the row space of the
original data and detect the outlier as well; for the data corrupted by
arbitrary errors, LRR can also approximately recover the row space with
theoretical guarantees. Since the subspace membership is provably determined by
the row space, these further imply that LRR can perform robust subspace
segmentation and error correction, in an efficient way.Comment: IEEE Trans. Pattern Analysis and Machine Intelligenc
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Deep Learning Face Attributes in the Wild
Predicting face attributes in the wild is challenging due to complex face
variations. We propose a novel deep learning framework for attribute prediction
in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly
with attribute tags, but pre-trained differently. LNet is pre-trained by
massive general object categories for face localization, while ANet is
pre-trained by massive face identities for attribute prediction. This framework
not only outperforms the state-of-the-art with a large margin, but also reveals
valuable facts on learning face representation.
(1) It shows how the performances of face localization (LNet) and attribute
prediction (ANet) can be improved by different pre-training strategies.
(2) It reveals that although the filters of LNet are fine-tuned only with
image-level attribute tags, their response maps over entire images have strong
indication of face locations. This fact enables training LNet for face
localization with only image-level annotations, but without face bounding boxes
or landmarks, which are required by all attribute recognition works.
(3) It also demonstrates that the high-level hidden neurons of ANet
automatically discover semantic concepts after pre-training with massive face
identities, and such concepts are significantly enriched after fine-tuning with
attribute tags. Each attribute can be well explained with a sparse linear
combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201
- …