8,523 research outputs found
Can Synthetic Faces Undo the Damage of Dataset Bias to Face Recognition and Facial Landmark Detection?
It is well known that deep learning approaches to face recognition and facial
landmark detection suffer from biases in modern training datasets. In this
work, we propose to use synthetic face images to reduce the negative effects of
dataset biases on these tasks. Using a 3D morphable face model, we generate
large amounts of synthetic face images with full control over facial shape and
color, pose, illumination, and background. With a series of experiments, we
extensively test the effects of priming deep nets by pre-training them with
synthetic faces. We observe the following positive effects for face recognition
and facial landmark detection tasks: 1) Priming with synthetic face images
improves the performance consistently across all benchmarks because it reduces
the negative effects of biases in the training data. 2) Traditional approaches
for reducing the damage of dataset bias, such as data augmentation and transfer
learning, are less effective than training with synthetic faces. 3) Using
synthetic data, we can reduce the size of real-world datasets by 75% for face
recognition and by 50% for facial landmark detection while maintaining
performance. Thus, offering a means to focus the data collection process on
less but higher quality data.Comment: Technical repor
Pseudo-positive regularization for deep person re-identification
An intrinsic challenge of person re-identification (re-ID) is the annotation
difficulty. This typically means 1) few training samples per identity, and 2)
thus the lack of diversity among the training samples. Consequently, we face
high risk of over-fitting when training the convolutional neural network (CNN),
a state-of-the-art method in person re-ID. To reduce the risk of over-fitting,
this paper proposes a Pseudo Positive Regularization (PPR) method to enrich the
diversity of the training data. Specifically, unlabeled data from an
independent pedestrian database is retrieved using the target training data as
query. A small proportion of these retrieved samples are randomly selected as
the Pseudo Positive samples and added to the target training set for the
supervised CNN training. The addition of Pseudo Positive samples is therefore a
data augmentation method to reduce the risk of over-fitting during CNN
training. We implement our idea in the identification CNN models (i.e.,
CaffeNet, VGGNet-16 and ResNet-50). On CUHK03 and Market-1501 datasets,
experimental results demonstrate that the proposed method consistently improves
the baseline and yields competitive performance to the state-of-the-art person
re-ID methods.Comment: 12 pages, 6 figure
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Hierarchical Invariant Feature Learning with Marginalization for Person Re-Identification
This paper addresses the problem of matching pedestrians across multiple
camera views, known as person re-identification. Variations in lighting
conditions, environment and pose changes across camera views make
re-identification a challenging problem. Previous methods address these
challenges by designing specific features or by learning a distance function.
We propose a hierarchical feature learning framework that learns invariant
representations from labeled image pairs. A mapping is learned such that the
extracted features are invariant for images belonging to same individual across
views. To learn robust representations and to achieve better generalization to
unseen data, the system has to be trained with a large amount of data.
Critically, most of the person re-identification datasets are small. Manually
augmenting the dataset by partial corruption of input data introduces
additional computational burden as it requires several training epochs to
converge. We propose a hierarchical network which incorporates a
marginalization technique that can reap the benefits of training on large
datasets without explicit augmentation. We compare our approach with several
baseline algorithms as well as popular linear and non-linear metric learning
algorithms and demonstrate improved performance on challenging publicly
available datasets, VIPeR, CUHK01, CAVIAR4REID and iLIDS. Our approach also
achieves the stateof-the-art results on these datasets
Gated Siamese Convolutional Neural Network Architecture for Human Re-Identification
Matching pedestrians across multiple camera views, known as human
re-identification, is a challenging research problem that has numerous
applications in visual surveillance. With the resurgence of Convolutional
Neural Networks (CNNs), several end-to-end deep Siamese CNN architectures have
been proposed for human re-identification with the objective of projecting the
images of similar pairs (i.e. same identity) to be closer to each other and
those of dissimilar pairs to be distant from each other. However, current
networks extract fixed representations for each image regardless of other
images which are paired with it and the comparison with other images is done
only at the final level. In this setting, the network is at risk of failing to
extract finer local patterns that may be essential to distinguish positive
pairs from hard negative pairs. In this paper, we propose a gating function to
selectively emphasize such fine common local patterns by comparing the
mid-level features across pairs of images. This produces flexible
representations for the same image according to the images they are paired
with. We conduct experiments on the CUHK03, Market-1501 and VIPeR datasets and
demonstrate improved performance compared to a baseline Siamese CNN
architecture.Comment: Accepted to ECCV201
In Defense of the Triplet Loss for Person Re-Identification
In the past few years, the field of computer vision has gone through a
revolution fueled mainly by the advent of large datasets and the adoption of
deep convolutional neural networks for end-to-end learning. The person
re-identification subfield is no exception to this. Unfortunately, a prevailing
belief in the community seems to be that the triplet loss is inferior to using
surrogate losses (classification, verification) followed by a separate metric
learning step. We show that, for models trained from scratch as well as
pretrained ones, using a variant of the triplet loss to perform end-to-end deep
metric learning outperforms most other published methods by a large margin.Comment: Lucas Beyer and Alexander Hermans contributed equally. Updates: Minor
fixes, new SOTA comparisons, add CUHK03 result
Do We Really Need to Collect Millions of Faces for Effective Face Recognition?
Face recognition capabilities have recently made extraordinary leaps. Though
this progress is at least partially due to ballooning training set sizes --
huge numbers of face images downloaded and labeled for identity -- it is not
clear if the formidable task of collecting so many images is truly necessary.
We propose a far more accessible means of increasing training data sizes for
face recognition systems. Rather than manually harvesting and labeling more
faces, we simply synthesize them. We describe novel methods of enriching an
existing dataset with important facial appearance variations by manipulating
the faces it contains. We further apply this synthesis approach when matching
query images represented using a standard convolutional neural network. The
effect of training and testing with synthesized images is extensively tested on
the LFW and IJB-A (verification and identification) benchmarks and Janus CS2.
The performances obtained by our approach match state of the art results
reported by systems trained on millions of downloaded images
Style Normalization and Restitution for Generalizable Person Re-identification
Existing fully-supervised person re-identification (ReID) methods usually
suffer from poor generalization capability caused by domain gaps. The key to
solving this problem lies in filtering out identity-irrelevant interference and
learning domain-invariant person representations. In this paper, we aim to
design a generalizable person ReID framework which trains a model on source
domains yet is able to generalize/perform well on target domains. To achieve
this goal, we propose a simple yet effective Style Normalization and
Restitution (SNR) module. Specifically, we filter out style variations (e.g.,
illumination, color contrast) by Instance Normalization (IN). However, such a
process inevitably removes discriminative information. We propose to distill
identity-relevant feature from the removed information and restitute it to the
network to ensure high discrimination. For better disentanglement, we enforce a
dual causal loss constraint in SNR to encourage the separation of
identity-relevant features and identity-irrelevant features. Extensive
experiments demonstrate the strong generalization capability of our framework.
Our models empowered by the SNR modules significantly outperform the
state-of-the-art domain generalization approaches on multiple widely-used
person ReID benchmarks, and also show superiority on unsupervised domain
adaptation.Comment: Accepted by CVPR202
Frustratingly Easy Person Re-Identification: Generalizing Person Re-ID in Practice
Contemporary person re-identification (\reid) methods usually require access
to data from the deployment camera network during training in order to perform
well. This is because contemporary \reid{} models trained on one dataset do not
generalise to other camera networks due to the domain-shift between datasets.
This requirement is often the bottleneck for deploying \reid{} systems in
practical security or commercial applications, as it may be impossible to
collect this data in advance or prohibitively costly to annotate it. This paper
alleviates this issue by proposing a simple baseline for domain
generalizable~(DG) person re-identification. That is, to learn a \reid{} model
from a set of source domains that is suitable for application to unseen
datasets out-of-the-box, without any model updating. Specifically, we observe
that the domain discrepancy in \reid{} is due to style and content variance
across datasets and demonstrate appropriate Instance and Feature Normalization
alleviates much of the resulting domain-shift in Deep \reid{} models. Instance
Normalization~(IN) in early layers filters out style statistic variations and
Feature Normalization~(FN) in deep layers is able to further eliminate
disparity in content statistics. Compared to contemporary alternatives, this
approach is extremely simple to implement, while being faster to train and
test, thus making it an extremely valuable baseline for implementing \reid{} in
practice. With a few lines of code, it increases the rank 1 \reid{} accuracy by
{11.8\%, 33.2\%, 12.8\% and 8.5\%} on the VIPeR, PRID, GRID, and i-LIDS
benchmarks respectively. Source codes are available at
\url{https://github.com/BJTUJia/person_reID_DualNorm}.Comment: 14 pages,2 figure
SREFI: Synthesis of Realistic Example Face Images
In this paper, we propose a novel face synthesis approach that can generate
an arbitrarily large number of synthetic images of both real and synthetic
identities. Thus a face image dataset can be expanded in terms of the number of
identities represented and the number of images per identity using this
approach, without the identity-labeling and privacy complications that come
from downloading images from the web. To measure the visual fidelity and
uniqueness of the synthetic face images and identities, we conducted face
matching experiments with both human participants and a CNN pre-trained on a
dataset of 2.6M real face images. To evaluate the stability of these synthetic
faces, we trained a CNN model with an augmented dataset containing close to
200,000 synthetic faces. We used a snapshot of this trained CNN to recognize
extremely challenging frontal (real) face images. Experiments showed training
with the augmented faces boosted the face recognition performance of the CNN
- …