Visible-infrared person re-identification (VI-ReID) is a challenging and
essential task, which aims to retrieve a set of person images over visible and
infrared camera views. In order to mitigate the impact of large modality
discrepancy existing in heterogeneous images, previous methods attempt to apply
generative adversarial network (GAN) to generate the modality-consisitent data.
However, due to severe color variations between the visible domain and infrared
domain, the generated fake cross-modality samples often fail to possess good
qualities to fill the modality gap between synthesized scenarios and target
real ones, which leads to sub-optimal feature representations. In this work, we
address cross-modality matching problem with Aligned Grayscale Modality (AGM),
an unified dark-line spectrum that reformulates visible-infrared dual-mode
learning as a gray-gray single-mode learning problem. Specifically, we generate
the grasycale modality from the homogeneous visible images. Then, we train a
style tranfer model to transfer infrared images into homogeneous grayscale
images. In this way, the modality discrepancy is significantly reduced in the
image space. In order to reduce the remaining appearance discrepancy, we
further introduce a multi-granularity feature extraction network to conduct
feature-level alignment. Rather than relying on the global information, we
propose to exploit local (head-shoulder) features to assist person Re-ID, which
complements each other to form a stronger feature descriptor. Comprehensive
experiments implemented on the mainstream evaluation datasets include SYSU-MM01
and RegDB indicate that our method can significantly boost cross-modality
retrieval performance against the state of the art methods.Comment: 15 pages, 9figure