Due to the modality gap between visible and infrared images with high visual
ambiguity, learning \textbf{diverse} modality-shared semantic concepts for
visible-infrared person re-identification (VI-ReID) remains a challenging
problem. Body shape is one of the significant modality-shared cues for VI-ReID.
To dig more diverse modality-shared cues, we expect that erasing
body-shape-related semantic concepts in the learned features can force the ReID
model to extract more and other modality-shared features for identification. To
this end, we propose shape-erased feature learning paradigm that decorrelates
modality-shared features in two orthogonal subspaces. Jointly learning
shape-related feature in one subspace and shape-erased features in the
orthogonal complement achieves a conditional mutual information maximization
between shape-erased feature and identity discarding body shape information,
thus enhancing the diversity of the learned representation explicitly.
Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate
the effectiveness of our method.Comment: CVPR 202