630 research outputs found
Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification
RGB-Infrared (IR) person re-identification is very challenging due to the
large cross-modality variations between RGB and IR images. The key solution is
to learn aligned features to the bridge RGB and IR modalities. However, due to
the lack of correspondence labels between every pair of RGB and IR images, most
methods try to alleviate the variations with set-level alignment by reducing
the distance between the entire RGB and IR sets. However, this set-level
alignment may lead to misalignment of some instances, which limits the
performance for RGB-IR Re-ID. Different from existing methods, in this paper,
we propose to generate cross-modality paired-images and perform both global
set-level and fine-grained instance-level alignments. Our proposed method
enjoys several merits. First, our method can perform set-level alignment by
disentangling modality-specific and modality-invariant features. Compared with
conventional methods, ours can explicitly remove the modality-specific features
and the modality variation can be better reduced. Second, given cross-modality
unpaired-images of a person, our method can generate cross-modality paired
images from exchanged images. With them, we can directly perform instance-level
alignment by minimizing distances of every pair of images. Extensive
experimental results on two standard benchmarks demonstrate that the proposed
model favourably against state-of-the-art methods. Especially, on SYSU-MM01
dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and
mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.Comment: accepted by AAAI'2
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for Single Modality Labeled Visible-Infrared Person Re-identification
Visible-Infrared person re-identification (VI-ReID) in real-world scenarios
poses a significant challenge due to the high cost of cross-modality data
annotation. Different sensing cameras, such as RGB/IR cameras for good/poor
lighting conditions, make it costly and error-prone to identify the same person
across modalities. To overcome this, we explore the use of single-modality
labeled data for the VI-ReID task, which is more cost-effective and practical.
By labeling pedestrians in only one modality (e.g., visible images) and
retrieving in another modality (e.g., infrared images), we aim to create a
training set containing both originally labeled and modality-translated data
using unpaired image-to-image translation techniques. In this paper, we propose
VI-Diff, a diffusion model that effectively addresses the task of
Visible-Infrared person image translation. Through comprehensive experiments,
we demonstrate that VI-Diff outperforms existing diffusion and GAN models,
making it a promising solution for VI-ReID with single-modality labeled data.
Our approach can be a promising solution to the VI-ReID task with
single-modality labeled data and serves as a good starting point for future
study. Code will be available.Comment: 11 pages, 7 figure
Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning
Visible-infrared person re-identification (VI-ReID) aims to retrieve images
of the same pedestrian from different modalities, where the challenges lie in
the significant modality discrepancy. To alleviate the modality gap, recent
methods generate intermediate images by GANs, grayscaling, or mixup strategies.
However, these methods could ntroduce extra noise, and the semantic
correspondence between the two modalities is not well learned. In this paper,
we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of
the same person from two modalities are split into patches and stitched into a
new one for model learning. In this way, the modellearns to recognize a person
through patches of different styles, and the modality semantic correspondence
is directly embodied. With the flexible image generation strategy, the
patch-mixed images freely adjust the ratio of different modality patches, which
could further alleviate the modality imbalance problem. In addition, the
relationship between identity centers among modalities is explored to further
reduce the modality variance, and the global-to-part constraint is introduced
to regularize representation learning of part features. On two VI-ReID
datasets, we report new state-of-the-art performance with the proposed method.Comment: IJCAI2
- …