4,366 research outputs found
In Defense of the Triplet Loss for Person Re-Identification
In the past few years, the field of computer vision has gone through a
revolution fueled mainly by the advent of large datasets and the adoption of
deep convolutional neural networks for end-to-end learning. The person
re-identification subfield is no exception to this. Unfortunately, a prevailing
belief in the community seems to be that the triplet loss is inferior to using
surrogate losses (classification, verification) followed by a separate metric
learning step. We show that, for models trained from scratch as well as
pretrained ones, using a variant of the triplet loss to perform end-to-end deep
metric learning outperforms most other published methods by a large margin.Comment: Lucas Beyer and Alexander Hermans contributed equally. Updates: Minor
fixes, new SOTA comparisons, add CUHK03 result
MassFace: an efficient implementation using triplet loss for face recognition
In this paper we present an efficient implementation using triplet loss for
face recognition. We conduct the practical experiment to analyze the factors
that influence the training of triplet loss. All models are trained on
CASIA-Webface dataset and tested on LFW. We analyze the experiment results and
give some insights to help others balance the factors when they apply triplet
loss to their own problem especially for face recognition task. Code has been
released in https://github.com/yule-li/MassFace
In Defense of the Classification Loss for Person Re-Identification
The recent research for person re-identification has been focused on two
trends. One is learning the part-based local features to form more informative
feature descriptors. The other is designing effective metric learning loss
functions such as the triplet loss family. We argue that learning global
features with classification loss could achieve the same goal, even with some
simple and cost-effective architecture design. In this paper, we first explain
why the person re-id framework with standard classification loss usually has
inferior performance compared to metric learning. Based on that, we further
propose a person re-id framework featured by channel grouping and multi-branch
strategy, which divides global features into multiple channel groups and learns
the discriminative channel group features by multi-branch classification
layers. The extensive experiments show that our framework outperforms prior
state-of-the-arts in terms of both accuracy and inference speed
Metric Attack and Defense for Person Re-identification
Person re-identification (re-ID) has attracted much attention recently due to
its great importance in video surveillance. In general, distance metrics used
to identify two person images are expected to be robust under various
appearance changes. However, our work observes the extreme vulnerability of
existing distance metrics to adversarial examples, generated by simply adding
human-imperceptible perturbations to person images. Hence, the security danger
is dramatically increased when deploying commercial re-ID systems in video
surveillance.
Although adversarial examples have been extensively applied for
classification analysis, it is rarely studied in metric analysis like person
re-identification. The most likely reason is the natural gap between the
training and testing of re-ID networks, that is, the predictions of a re-ID
network cannot be directly used during testing without an effective metric. In
this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel
methodology to adversarial classification attacks. Comprehensive experiments
clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also
present an early attempt of training a metric-preserving network, thereby
defending the metric against adversarial attacks. At last, by benchmarking
various adversarial settings, we expect that our work can facilitate the
development of adversarial attack and defense in metric-based applications
VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera
Vehicle re-identification is a challenging task due to high intra-class
variances and small inter-class variances. In this work, we focus on the
failure cases caused by similar background and shape. They pose serve bias on
similarity, making it easier to neglect fine-grained information. To reduce the
bias, we propose an approach named VOC-ReID, taking the triplet
vehicle-orientation-camera as a whole and reforming background/shape similarity
as camera/orientation re-identification. At first, we train models for vehicle,
orientation and camera re-identification respectively. Then we use orientation
and camera similarity as penalty to get final similarity. Besides, we propose a
high performance baseline boosted by bag of tricks and weakly supervised data
augmentation. Our algorithm achieves the second place in vehicle
re-identification at the NVIDIA AI City Challenge 2020.Comment: AICity2020 Challenge, CVPR 2020 workshop, code avaible at github(link
in abstract
Cross-Resolution Person Re-identification with Deep Antithetical Learning
Images with different resolutions are ubiquitous in public person
re-identification (ReID) datasets and real-world scenes, it is thus crucial for
a person ReID model to handle the image resolution variations for improving its
generalization ability. However, most existing person ReID methods pay little
attention to this resolution discrepancy problem. One paradigm to deal with
this problem is to use some complicated methods for mapping all images into an
artificial image space, which however will disrupt the natural image
distribution and requires heavy image preprocessing. In this paper, we analyze
the deficiencies of several widely-used objective functions handling image
resolution discrepancies and propose a new framework called deep antithetical
learning that directly learns from the natural image space rather than creating
an arbitrary one. We first quantify and categorize original training images
according to their resolutions. Then we create an antithetical training set and
make sure that original training images have counterparts with antithetical
resolutions in this new set. At last, a novel Contrastive Center Loss(CCL) is
proposed to learn from images with different resolutions without being
interfered by their resolution discrepancies. Extensive experimental analyses
and evaluations indicate that the proposed framework, even using a vanilla deep
ReID network, exhibits remarkable performance improvements. Without bells and
whistles, our approach outperforms previous state-of-the-art methods by a large
margin
ReadNet:Towards Accurate ReID with Limited and Noisy Samples
Person re-identification (ReID) is an essential cross-camera retrieval task
to identify pedestrians. However, the photo number of each pedestrian usually
differs drastically, and thus the data limitation and imbalance problem hinders
the prediction accuracy greatly. Additionally, in real-world applications,
pedestrian images are captured by different surveillance cameras, so the noisy
camera related information, such as the lights, perspectives and resolutions,
result in inevitable domain gaps for ReID algorithms. These challenges bring
difficulties to current deep learning methods with triplet loss for coping with
such problems. To address these challenges, this paper proposes ReadNet, an
adversarial camera network (ACN) with an angular triplet loss (ATL). In detail,
ATL focuses on learning the angular distance among different identities to
mitigate the effect of data imbalance, and guarantees a linear decision
boundary as well, while ACN takes the camera discriminator as a game opponent
of feature extractor to filter camera related information to bridge the
multi-camera gaps. ReadNet is designed to be flexible so that either ATL or ACN
can be deployed independently or simultaneously. The experiment results on
various benchmark datasets have shown that ReadNet can deliver better
prediction performance than current state-of-the-art methods
Triplet Distillation for Deep Face Recognition
Convolutional neural networks (CNNs) have achieved a great success in face
recognition, which unfortunately comes at the cost of massive computation and
storage consumption. Many compact face recognition networks are thus proposed
to resolve this problem. Triplet loss is effective to further improve the
performance of those compact models. However, it normally employs a fixed
margin to all the samples, which neglects the informative similarity structures
between different identities. In this paper, we propose an enhanced version of
triplet loss, named triplet distillation, which exploits the capability of a
teacher model to transfer the similarity information to a small model by
adaptively varying the margin between positive and negative pairs. Experiments
on LFW, AgeDB, and CPLFW datasets show the merits of our method compared to the
original triplet loss.Comment: 5 pages, 2 tables, accpeted by ICML 2019 ODML-CDNNR Worksho
Person Re-identification Using Visual Attention
Despite recent attempts for solving the person re-identification problem, it
remains a challenging task since a person's appearance can vary significantly
when large variations in view angle, human pose, and illumination are involved.
In this paper, we propose a novel approach based on using a gradient-based
attention mechanism in deep convolution neural network for solving the person
re-identification problem. Our model learns to focus selectively on parts of
the input image for which the networks' output is most sensitive to and
processes them with high resolution while perceiving the surrounding image in
low resolution. Extensive comparative evaluations demonstrate that the proposed
method outperforms state-of-the-art approaches on the challenging CUHK01,
CUHK03, and Market 1501 datasets.Comment: Published at IEEE International Conference on Image Processing 201
Metric Embedding Autoencoders for Unsupervised Cross-Dataset Transfer Learning
Cross-dataset transfer learning is an important problem in person
re-identification (Re-ID). Unfortunately, not too many deep transfer Re-ID
models exist for realistic settings of practical Re-ID systems. We propose a
purely deep transfer Re-ID model consisting of a deep convolutional neural
network and an autoencoder. The latent code is divided into metric embedding
and nuisance variables. We then utilize an unsupervised training method that
does not rely on co-training with non-deep models. Our experiments show
improvements over both the baseline and competitors' transfer learning models.Comment: ICANN 2018 (The 27th International Conference on Artificial Neural
Networks) proceedin
- …