1,771 research outputs found
Attribute-aware Identity-hard Triplet Loss for Video-based Person Re-identification
Video-based person re-identification (Re-ID) is an important computer vision
task. The batch-hard triplet loss frequently used in video-based person Re-ID
suffers from the Distance Variance among Different Positives (DVDP) problem. In
this paper, we address this issue by introducing a new metric learning method
called Attribute-aware Identity-hard Triplet Loss (AITL), which reduces the
intra-class variation among positive samples via calculating attribute
distance. To achieve a complete model of video-based person Re-ID, a multi-task
framework with Attribute-driven Spatio-Temporal Attention (ASTA) mechanism is
also proposed. Extensive experiments on MARS and DukeMTMC-VID datasets shows
that both the AITL and ASTA are very effective. Enhanced by them, even a simple
light-weighted video-based person Re-ID baseline can outperform existing
state-of-the-art approaches. The codes has been published on
https://github.com/yuange250/Video-based-person-ReID-with-Attribute-information
Improved Hard Example Mining by Discovering Attribute-based Hard Person Identity
In this paper, we propose Hard Person Identity Mining (HPIM) that attempts to
refine the hard example mining to improve the exploration efficacy in person
re-identification. It is motivated by following observation: the more
attributes some people share, the more difficult to separate their identities.
Based on this observation, we develop HPIM via a transferred attribute
describer, a deep multi-attribute classifier trained from the source noisy
person attribute datasets. We encode each image into the attribute
probabilistic description in the target person re-ID dataset. Afterwards in the
attribute code space, we consider each person as a distribution to generate his
view-specific attribute codes in different practical scenarios. Hence we
estimate the person-specific statistical moments from zeroth to higher order,
which are further used to calculate the central moment discrepancies between
persons. Such discrepancy is a ground to choose hard identity to organize
proper mini-batches, without concerning the person representation changing in
metric learning. It presents as a complementary tool of hard example mining,
which helps to explore the global instead of the local hard example constraint
in the mini-batch built by randomly sampled identities. Extensive experiments
on two person re-identification benchmarks validated the effectiveness of our
proposed algorithm
Hierarchical Feature Embedding for Attribute Recognition
Attribute recognition is a crucial but challenging task due to viewpoint
changes, illumination variations and appearance diversities, etc. Most of
previous work only consider the attribute-level feature embedding, which might
perform poorly in complicated heterogeneous conditions. To address this
problem, we propose a hierarchical feature embedding (HFE) framework, which
learns a fine-grained feature embedding by combining attribute and ID
information. In HFE, we maintain the inter-class and intra-class feature
embedding simultaneously. Not only samples with the same attribute but also
samples with the same ID are gathered more closely, which could restrict the
feature embedding of visually hard samples with regard to attributes and
improve the robustness to variant conditions. We establish this hierarchical
structure by utilizing HFE loss consisted of attribute-level and ID-level
constraints. We also introduce an absolute boundary regularization and a
dynamic loss weight as supplementary components to help build up the feature
embedding. Experiments show that our method achieves the state-of-the-art
results on two pedestrian attribute datasets and a facial attribute dataset.Comment: CVPR 202
Sharp Attention Network via Adaptive Sampling for Person Re-identification
In this paper, we present novel sharp attention networks by adaptively
sampling feature maps from convolutional neural networks (CNNs) for person
re-identification (re-ID) problem. Due to the introduction of sampling-based
attention models, the proposed approach can adaptively generate sharper
attention-aware feature masks. This greatly differs from the gating-based
attention mechanism that relies soft gating functions to select the relevant
features for person re-ID. In contrast, the proposed sampling-based attention
mechanism allows us to effectively trim irrelevant features by enforcing the
resultant feature masks to focus on the most discriminative features. It can
produce sharper attentions that are more assertive in localizing subtle
features relevant to re-identifying people across cameras. For this purpose, a
differentiable Gumbel-Softmax sampler is employed to approximate the Bernoulli
sampling to train the sharp attention networks. Extensive experimental
evaluations demonstrate the superiority of this new sharp attention model for
person re-ID over the other state-of-the-art methods on three challenging
benchmarks including CUHK03, Market-1501, and DukeMTMC-reID.Comment: accepted by IEEE Transactions on Circuits and Systems for Video
Technology(T-CSVT
CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification
Person re-identification aims to identify the same pedestrian across
non-overlapping camera views. Deep learning techniques have been applied for
person re-identification recently, towards learning representation of
pedestrian appearance. This paper presents a novel Contextual-Attentional
Attribute-Appearance Network (CA3Net) for person re-identification. The CA3Net
simultaneously exploits the complementarity between semantic attributes and
visual appearance, the semantic context among attributes, visual attention on
attributes as well as spatial dependencies among body parts, leading to
discriminative and robust pedestrian representation. Specifically, an attribute
network within CA3Net is designed with an Attention-LSTM module. It
concentrates the network on latent image regions related to each attribute as
well as exploits the semantic context among attributes by a LSTM module. An
appearance network is developed to learn appearance features from the full
body, horizontal and vertical body parts of pedestrians with spatial
dependencies among body parts. The CA3Net jointly learns the attribute and
appearance features in a multi-task learning manner, generating comprehensive
representation of pedestrians. Extensive experiments on two challenging
benchmarks, i.e., Market-1501 and DukeMTMC-reID datasets, have demonstrated the
effectiveness of the proposed approach
Unsupervised Person Re-identification by Deep Learning Tracklet Association
Mostexistingpersonre-identification(re-id)methods relyon supervised model
learning on per-camera-pair manually labelled pairwise training data. This
leads to poor scalability in practical re-id deployment due to the lack of
exhaustive identity labelling of image positive and negative pairs for every
camera pair. In this work, we address this problem by proposing an unsupervised
re-id deep learning approach capable of incrementally discovering and
exploiting the underlying re-id discriminative information from automatically
generated person tracklet data from videos in an end-to-end model optimisation.
We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL)
framework characterised by jointly learning per-camera (within-camera) tracklet
association (labelling) and cross-camera tracklet correlation by maximising the
discovery of most likely tracklet relationships across camera views. Extensive
experiments demonstrate the superiority of the proposed TAUDL model over the
state-of-the-art unsupervised and domain adaptation re- id methods using six
person re-id benchmarking datasets.Comment: ECCV 2018 Ora
In Defense of the Triplet Loss for Person Re-Identification
In the past few years, the field of computer vision has gone through a
revolution fueled mainly by the advent of large datasets and the adoption of
deep convolutional neural networks for end-to-end learning. The person
re-identification subfield is no exception to this. Unfortunately, a prevailing
belief in the community seems to be that the triplet loss is inferior to using
surrogate losses (classification, verification) followed by a separate metric
learning step. We show that, for models trained from scratch as well as
pretrained ones, using a variant of the triplet loss to perform end-to-end deep
metric learning outperforms most other published methods by a large margin.Comment: Lucas Beyer and Alexander Hermans contributed equally. Updates: Minor
fixes, new SOTA comparisons, add CUHK03 result
AlignedReID: Surpassing Human-Level Performance in Person Re-Identification
In this paper, we propose a novel method called AlignedReID that extracts a
global feature which is jointly learned with local features. Global feature
learning benefits greatly from local feature learning, which performs an
alignment/matching by calculating the shortest path between two sets of local
features, without requiring extra supervision. After the joint learning, we
only keep the global feature to compute the similarities between images. Our
method achieves rank-1 accuracy of 94.4% on Market1501 and 97.8% on CUHK03,
outperforming state-of-the-art methods by a large margin. We also evaluate
human-level performance and demonstrate that our method is the first to surpass
human-level performance on Market1501 and CUHK03, two widely used Person ReID
datasets.Comment: 9 pages, 8 figure
MagnifierNet: Towards Semantic Adversary and Fusion for Person Re-identification
Although person re-identification (ReID) has achieved significant improvement
recently by enforcing part alignment, it is still a challenging task when it
comes to distinguishing visually similar identities or identifying the occluded
person. In these scenarios, magnifying details in each part features and
selectively fusing them together may provide a feasible solution. In this work,
we propose MagnifierNet, a triple-branch network which accurately mines details
from whole to parts. Firstly, the holistic salient features are encoded by a
global branch. Secondly, to enhance detailed representation for each semantic
region, the "Semantic Adversarial Branch" is designed to learn from dynamically
generated semantic-occluded samples during training. Meanwhile, we introduce
"Semantic Fusion Branch" to filter out irrelevant noises by selectively fusing
semantic region information sequentially. To further improve feature diversity,
we introduce a novel loss function "Semantic Diversity Loss" to remove
redundant overlaps across learned semantic representations. State-of-the-art
performance has been achieved on three benchmarks by large margins.
Specifically, the mAP score is improved by 6% and 5% on the most challenging
CUHK03-L and CUHK03-D benchmarks
Person Re-Identification using Deep Learning Networks: A Systematic Review
Person re-identification has received a lot of attention from the research
community in recent times. Due to its vital role in security based
applications, person re-identification lies at the heart of research relevant
to tracking robberies, preventing terrorist attacks and other security critical
events. While the last decade has seen tremendous growth in re-id approaches,
very little review literature exists to comprehend and summarize this progress.
This review deals with the latest state-of-the-art deep learning based
approaches for person re-identification. While the few existing re-id review
works have analysed re-id techniques from a singular aspect, this review
evaluates numerous re-id techniques from multiple deep learning aspects such as
deep architecture types, common Re-Id challenges (variation in pose, lightning,
view, scale, partial or complete occlusion, background clutter), multi-modal
Re-Id, cross-domain Re-Id challenges, metric learning approaches and video
Re-Id contributions. This review also includes several re-id benchmarks
collected over the years, describing their characteristics, specifications and
top re-id results obtained on them. The inclusion of the latest deep re-id
works makes this a significant contribution to the re-id literature. Lastly,
the conclusion and future directions are included.Comment: 34 pages, 15 figure
- …