Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to
match pedestrian images of the same identity from different modalities without
annotations. Existing works mainly focus on alleviating the modality gap by
aligning instance-level features of the unlabeled samples. However, the
relationships between cross-modality clusters are not well explored. To this
end, we propose a novel bilateral cluster matching-based learning framework to
reduce the modality gap by matching cross-modality clusters. Specifically, we
design a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM)
algorithm through optimizing the maximum matching problem in a bipartite graph.
Then, the matched pairwise clusters utilize shared visible and infrared
pseudo-labels during the model training. Under such a supervisory signal, a
Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework
is proposed to align features jointly at a cluster-level. Meanwhile, the
cross-modality Consistency Constraint (CC) is proposed to explicitly reduce the
large modality discrepancy. Extensive experiments on the public SYSU-MM01 and
RegDB datasets demonstrate the effectiveness of the proposed method, surpassing
state-of-the-art approaches by a large margin of 8.76% mAP on average