19,776 research outputs found
On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly
In the field of face recognition, Sparse Representation (SR) has received
considerable attention during the past few years. Most of the relevant
literature focuses on holistic descriptors in closed-set identification
applications. The underlying assumption in SR-based methods is that each class
in the gallery has sufficient samples and the query lies on the subspace
spanned by the gallery of the same class. Unfortunately, such assumption is
easily violated in the more challenging face verification scenario, where an
algorithm is required to determine if two faces (where one or both have not
been seen before) belong to the same person. In this paper, we first discuss
why previous attempts with SR might not be applicable to verification problems.
We then propose an alternative approach to face verification via SR.
Specifically, we propose to use explicit SR encoding on local image patches
rather than the entire face. The obtained sparse signals are pooled via
averaging to form multiple region descriptors, which are then concatenated to
form an overall face descriptor. Due to the deliberate loss spatial relations
within each region (caused by averaging), the resulting descriptor is robust to
misalignment & various image deformations. Within the proposed framework, we
evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder
Neural Network (SANN), and an implicit probabilistic technique based on
Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and
ChokePoint datasets show that the proposed local SR approach obtains
considerably better and more robust performance than several previous
state-of-the-art holistic SR methods, in both verification and closed-set
identification problems. The experiments also show that l1-minimisation based
encoding has a considerably higher computational than the other techniques, but
leads to higher recognition rates
Pooling Faces: Template based Face Recognition with Pooled Face Images
We propose a novel approach to template based face recognition. Our dual goal
is to both increase recognition accuracy and reduce the computational and
storage costs of template matching. To do this, we leverage on an approach
which was proven effective in many other domains, but, to our knowledge, never
fully explored for face images: average pooling of face photos. We show how
(and why!) the space of a template's images can be partitioned and then pooled
based on image quality and head pose and the effect this has on accuracy and
template size. We perform extensive tests on the IJB-A and Janus CS2 template
based face identification and verification benchmarks. These show that not only
does our approach outperform published state of the art despite requiring far
fewer cross template comparisons, but also, surprisingly, that image pooling
performs on par with deep feature pooling.Comment: Appeared in the IEEE Computer Society Workshop on Biometrics, IEEE
Conf. on Computer Vision and Pattern Recognition (CVPR), June, 201
VGGFace2: A dataset for recognising faces across pose and age
In this paper, we introduce a new large-scale face dataset named VGGFace2.
The dataset contains 3.31 million images of 9131 subjects, with an average of
362.6 images for each subject. Images are downloaded from Google Image Search
and have large variations in pose, age, illumination, ethnicity and profession
(e.g. actors, athletes, politicians). The dataset was collected with three
goals in mind: (i) to have both a large number of identities and also a large
number of images for each identity; (ii) to cover a large range of pose, age
and ethnicity; and (iii) to minimize the label noise. We describe how the
dataset was collected, in particular the automated and manual filtering stages
to ensure a high accuracy for the images of each identity. To assess face
recognition performance using the new dataset, we train ResNet-50 (with and
without Squeeze-and-Excitation blocks) Convolutional Neural Networks on
VGGFace2, on MS- Celeb-1M, and on their union, and show that training on
VGGFace2 leads to improved recognition performance over pose and age. Finally,
using the models trained on these datasets, we demonstrate state-of-the-art
performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A,
IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin.
Datasets and models are publicly available.Comment: This paper has been accepted by IEEE Conference on Automatic Face and
Gesture Recognition (F&G), 2018. (Oral
Expanded Parts Model for Semantic Description of Humans in Still Images
We introduce an Expanded Parts Model (EPM) for recognizing human attributes
(e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in
still images. An EPM is a collection of part templates which are learnt
discriminatively to explain specific scale-space regions in the images (in
human centric coordinates). This is in contrast to current models which consist
of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a
subset of the parts to score an image and scores the image sparsely in space,
i.e. it ignores redundant and random background in an image. To learn our
model, we propose an algorithm which automatically mines parts and learns
corresponding discriminative templates together with their respective locations
from a large number of candidate parts. We validate our method on three recent
challenging datasets of human attributes and actions. We obtain convincing
qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
- …