763 research outputs found
Shape-centered Representation Learning for Visible-Infrared Person Re-identification
Current Visible-Infrared Person Re-Identification (VI-ReID) methods
prioritize extracting distinguishing appearance features, ignoring the natural
resistance of body shape against modality changes. Initially, we gauged the
discriminative potential of shapes by a straightforward concatenation of shape
and appearance features. However, two unresolved issues persist in the
utilization of shape features. One pertains to the dependence on auxiliary
models for shape feature extraction in the inference phase, along with the
errors in generated infrared shapes due to the intrinsic modality disparity.
The other issue involves the inadequately explored correlation between shape
and appearance features. To tackle the aforementioned challenges, we propose
the Shape-centered Representation Learning framework (ScRL), which focuses on
learning shape features and appearance features associated with shapes.
Specifically, we devise the Shape Feature Propagation (SFP), facilitating
direct extraction of shape features from original images with minimal
complexity costs during inference. To restitute inaccuracies in infrared body
shapes at the feature level, we present the Infrared Shape Restitution (ISR).
Furthermore, to acquire appearance features related to shape, we design the
Appearance Feature Enhancement (AFE), which accentuates identity-related
features while suppressing identity-unrelated features guided by shape
features. Extensive experiments are conducted to validate the effectiveness of
the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy
attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM,
RegDB datasets respectively, outperforming existing state-of-the-art methods
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification
Thanks for the cross-modal retrieval techniques, visible-infrared (RGB-IR)
person re-identification (Re-ID) is achieved by projecting them into a common
space, allowing person Re-ID in 24-hour surveillance systems. However, with
respect to the probe-to-gallery, almost all existing RGB-IR based cross-modal
person Re-ID methods focus on image-to-image matching, while the video-to-video
matching which contains much richer spatial- and temporal-information remains
under-explored. In this paper, we primarily study the video-based cross-modal
person Re-ID method. To achieve this task, a video-based RGB-IR dataset is
constructed, in which 927 valid identities with 463,259 frames and 21,863
tracklets captured by 12 RGB/IR cameras are collected. Based on our constructed
dataset, we prove that with the increase of frames in a tracklet, the
performance does meet more enhancement, demonstrating the significance of
video-to-video matching in RGB-IR person Re-ID. Additionally, a novel method is
further proposed, which not only projects two modalities to a modal-invariant
subspace, but also extracts the temporal-memory for motion-invariant. Thanks to
these two strategies, much better results are achieved on our video-based
cross-modal person Re-ID. The code and dataset are released at:
https://github.com/VCMproject233/MITML
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
Due to the modality gap between visible and infrared images with high visual
ambiguity, learning \textbf{diverse} modality-shared semantic concepts for
visible-infrared person re-identification (VI-ReID) remains a challenging
problem. Body shape is one of the significant modality-shared cues for VI-ReID.
To dig more diverse modality-shared cues, we expect that erasing
body-shape-related semantic concepts in the learned features can force the ReID
model to extract more and other modality-shared features for identification. To
this end, we propose shape-erased feature learning paradigm that decorrelates
modality-shared features in two orthogonal subspaces. Jointly learning
shape-related feature in one subspace and shape-erased features in the
orthogonal complement achieves a conditional mutual information maximization
between shape-erased feature and identity discarding body shape information,
thus enhancing the diversity of the learned representation explicitly.
Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate
the effectiveness of our method.Comment: CVPR 202
- …