18 research outputs found
Semantics-Aligned Representation Learning for Person Re-identification
Person re-identification (reID) aims to match person images to retrieve the
ones with the same identity. This is a challenging task, as the images to be
matched are generally semantically misaligned due to the diversity of human
poses and capture viewpoints, incompleteness of the visible bodies (due to
occlusion), etc. In this paper, we propose a framework that drives the reID
network to learn semantics-aligned feature representation through delicate
supervision designs. Specifically, we build a Semantics Aligning Network (SAN)
which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder
(SA-Dec) for reconstructing/regressing the densely semantics aligned full
texture image. We jointly train the SAN under the supervisions of person
re-identification and aligned texture generation. Moreover, at the decoder,
besides the reconstruction loss, we add Triplet ReID constraints over the
feature maps as the perceptual losses. The decoder is discarded in the
inference and thus our scheme is computationally efficient. Ablation studies
demonstrate the effectiveness of our design. We achieve the state-of-the-art
performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the
partial person reID dataset Partial REID. Code for our proposed method is
available at:
https://github.com/microsoft/Semantics-Aligned-Representation-Learning-for-Person-Re-identification.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20),
code has been release
Holistic Guidance for Occluded Person Re-Identification
In real-world video surveillance applications, person re-identification
(ReID) suffers from the effects of occlusions and detection errors. Despite
recent advances, occlusions continue to corrupt the features extracted by
state-of-art CNN backbones, and thereby deteriorate the accuracy of ReID
systems. To address this issue, methods in the literature use an additional
costly process such as pose estimation, where pose maps provide supervision to
exclude occluded regions. In contrast, we introduce a novel Holistic Guidance
(HG) method that relies only on person identity labels, and on the distribution
of pairwise matching distances of datasets to alleviate the problem of
occlusion, without requiring additional supervision. Hence, our proposed
student-teacher framework is trained to address the occlusion problem by
matching the distributions of between- and within-class distances (DCDs) of
occluded samples with that of holistic (non-occluded) samples, thereby using
the latter as a soft labeled reference to learn well separated DCDs. This
approach is supported by our empirical study where the distribution of between-
and within-class distances between images have more overlap in occluded than
holistic datasets. In particular, features extracted from both datasets are
jointly learned using the student model to produce an attention map that allows
separating visible regions from occluded ones. In addition to this, a joint
generative-discriminative backbone is trained with a denoising autoencoder,
allowing the system to self-recover from occlusions. Extensive experiments on
several challenging public datasets indicate that the proposed approach can
outperform state-of-the-art methods on both occluded and holistic datasetsComment: 10 page
DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification
The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP)
method for occluded person re-identification (ReID). Unlike mainstream
approaches using global features for simultaneous multi-task learning of ReID
and human parsing, or relying on semantic information for attention guidance,
DROP argues that the inferior performance of the former is due to distinct
granularity requirements for ReID and human parsing features. ReID focuses on
instance part-level differences between pedestrian parts, while human parsing
centers on semantic spatial context, reflecting the internal structure of the
human body. To address this, DROP decouples features for ReID and human
parsing, proposing detail-preserving upsampling to combine varying resolution
feature maps. Parsing-specific features for human parsing are decoupled, and
human position information is exclusively added to the human parsing branch. In
the ReID branch, a part-aware compactness loss is introduced to enhance
instance-level part differences. Experimental results highlight the efficacy of
DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke,
surpassing two mainstream methods. The codebase is accessible at
https://github.com/shuguang-52/DROP