3,194 research outputs found
Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data
The re-identification (ReID) of individuals over a complex network of cameras
is a challenging task, especially under real-world surveillance conditions.
Several deep learning models have been proposed for visible-infrared (V-I)
person ReID to recognize individuals from images captured using RGB and IR
cameras. However, performance may decline considerably if RGB and IR images
captured at test time are corrupted (e.g., noise, blur, and weather
conditions). Although various data augmentation (DA) methods have been explored
to improve the generalization capacity, these are not adapted for V-I person
ReID. In this paper, a specialized DA strategy is proposed to address this
multimodal setting. Given both the V and I modalities, this strategy allows to
diminish the impact of corruption on the accuracy of deep person ReID models.
Corruption may be modality-specific, and an additional modality often provides
complementary information. Our multimodal DA strategy is designed specifically
to encourage modality collaboration and reinforce generalization capability.
For instance, punctual masking of modalities forces the model to select the
informative modality. Local DA is also explored for advanced selection of
features within and among modalities. The impact of training baseline fusion
models for V-I person ReID using the proposed multimodal DA strategy is
assessed on corrupted versions of the SYSU-MM01, RegDB, and ThermalWORLD
datasets in terms of complexity and efficiency. Results indicate that using our
strategy provides V-I ReID models the ability to exploit both shared and
individual modality knowledge so they can outperform models trained with no or
unimodal DA. GitHub code: https://github.com/art2611/ML-MDA.Comment: 8 pages of main content, 2 pages of references, 2 pages of
supplementary material, 3 figures, WACV 2023 RWS workshop
CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification
Visible-infrared cross-modality person re-identification is a challenging
ReID task, which aims to retrieve and match the same identity's images between
the heterogeneous visible and infrared modalities. Thus, the core of this task
is to bridge the huge gap between these two modalities. The existing
convolutional neural network-based methods mainly face the problem of
insufficient perception of modalities' information, and can not learn good
discriminative modality-invariant embeddings for identities, which limits their
performance. To solve these problems, we propose a cross-modality
transformer-based method (CMTR) for the visible-infrared person
re-identification task, which can explicitly mine the information of each
modality and generate better discriminative features based on it. Specifically,
to capture modalities' characteristics, we design the novel modality
embeddings, which are fused with token embeddings to encode modalities'
information. Furthermore, to enhance representation of modality embeddings and
adjust matching embeddings' distribution, we propose a modality-aware
enhancement loss based on the learned modalities' information, reducing
intra-class distance and enlarging inter-class distance. To our knowledge, this
is the first work of applying transformer network to the cross-modality
re-identification task. We implement extensive experiments on the public
SYSU-MM01 and RegDB datasets, and our proposed CMTR model's performance
significantly surpasses existing outstanding CNN-based methods.Comment: 11 pages, 7 figures, 7 table
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID)
aims at learning modality-invariant features from unlabeled cross-modality
dataset, which is crucial for practical applications in video surveillance
systems. The key to essentially address the USL-VI-ReID task is to solve the
cross-modality data association problem for further heterogeneous joint
learning. To address this issue, we propose a Dual Optimal Transport Label
Assignment (DOTLA) framework to simultaneously assign the generated labels from
one modality to its counterpart modality. The proposed DOTLA mechanism
formulates a mutual reinforcement and efficient solution to cross-modality data
association, which could effectively reduce the side-effects of some
insufficient and noisy label associations. Besides, we further propose a
cross-modality neighbor consistency guided label refinement and regularization
module, to eliminate the negative effects brought by the inaccurate supervised
signals, under the assumption that the prediction or label distribution of each
example should be similar to its nearest neighbors. Extensive experimental
results on the public SYSU-MM01 and RegDB datasets demonstrate the
effectiveness of the proposed method, surpassing existing state-of-the-art
approach by a large margin of 7.76% mAP on average, which even surpasses some
supervised VI-ReID methods
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
- …