2,790 research outputs found
Deep Perceptual Mapping for Thermal to Visible Face Recognition
Cross modal face matching between the thermal and visible spectrum is a much
de- sired capability for night-time surveillance and security applications. Due
to a very large modality gap, thermal-to-visible face recognition is one of the
most challenging face matching problem. In this paper, we present an approach
to bridge this modality gap by a significant margin. Our approach captures the
highly non-linear relationship be- tween the two modalities by using a deep
neural network. Our model attempts to learn a non-linear mapping from visible
to thermal spectrum while preserving the identity in- formation. We show
substantive performance improvement on a difficult thermal-visible face
dataset. The presented approach improves the state-of-the-art by more than 10%
in terms of Rank-1 identification and bridge the drop in performance due to the
modality gap by more than 40%.Comment: BMVC 2015 (oral
Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification
An efficient and effective person re-identification (ReID) system relieves
the users from painful and boring video watching and accelerates the process of
video analysis. Recently, with the explosive demands of practical applications,
a lot of research efforts have been dedicated to heterogeneous person
re-identification (Hetero-ReID). In this paper, we provide a comprehensive
review of state-of-the-art Hetero-ReID methods that address the challenge of
inter-modality discrepancies. According to the application scenario, we
classify the methods into four categories -- low-resolution, infrared, sketch,
and text. We begin with an introduction of ReID, and make a comparison between
Homogeneous ReID (Homo-ReID) and Hetero-ReID tasks. Then, we describe and
compare existing datasets for performing evaluations, and survey the models
that have been widely employed in Hetero-ReID. We also summarize and compare
the representative approaches from two perspectives, i.e., the application
scenario and the learning pipeline. We conclude by a discussion of some future
research directions. Follow-up updates are avaible at:
https://github.com/lightChaserX/Awesome-Hetero-reIDComment: Accepted by IJCAI 2020. Project url:
https://github.com/lightChaserX/Awesome-Hetero-reI
Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
Visible-infrared person re-identification (ReID) aims to recognize a same
person of interest across a network of RGB and IR cameras. Some deep learning
(DL) models have directly incorporated both modalities to discriminate persons
in a joint representation space. However, this cross-modal ReID problem remains
challenging due to the large domain shift in data distributions between RGB and
IR modalities. % This paper introduces a novel approach for a creating
intermediate virtual domain that acts as bridges between the two main domains
(i.e., RGB and IR modalities) during training. This intermediate domain is
considered as privileged information (PI) that is unavailable at test time, and
allows formulating this cross-modal matching task as a problem in learning
under privileged information (LUPI). We devised a new method to generate images
between visible and infrared domains that provide additional information to
train a deep ReID model through an intermediate domain adaptation. In
particular, by employing color-free and multi-step triplet loss objectives
during training, our method provides common feature representation spaces that
are robust to large visible-infrared domain shifts. % Experimental results on
challenging visible-infrared ReID datasets indicate that our proposed approach
consistently improves matching accuracy, without any computational overhead at
test time. The code is available at:
\href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI
Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification
In visible-infrared video person re-identification (re-ID), extracting
features not affected by complex scenes (such as modality, camera views,
pedestrian pose, background, etc.) changes, and mining and utilizing motion
information are the keys to solving cross-modal pedestrian identity matching.
To this end, the paper proposes a new visible-infrared video person re-ID
method from a novel perspective, i.e., adversarial self-attack defense and
spatial-temporal relation mining. In this work, the changes of views, posture,
background and modal discrepancy are considered as the main factors that cause
the perturbations of person identity features. Such interference information
contained in the training samples is used as an adversarial perturbation. It
performs adversarial attacks on the re-ID model during the training to make the
model more robust to these unfavorable factors. The attack from the adversarial
perturbation is introduced by activating the interference information contained
in the input samples without generating adversarial samples, and it can be thus
called adversarial self-attack. This design allows adversarial attack and
defense to be integrated into one framework. This paper further proposes a
spatial-temporal information-guided feature representation network to use the
information in video sequences. The network cannot only extract the information
contained in the video-frame sequences but also use the relation of the local
information in space to guide the network to extract more robust features. The
proposed method exhibits compelling performance on large-scale cross-modality
video datasets. The source code of the proposed method will be released at
https://github.com/lhf12278/xxx.Comment: 11 pages,8 figure
Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning
Visible-infrared person re-identification (VI-ReID) aims to retrieve images
of the same pedestrian from different modalities, where the challenges lie in
the significant modality discrepancy. To alleviate the modality gap, recent
methods generate intermediate images by GANs, grayscaling, or mixup strategies.
However, these methods could ntroduce extra noise, and the semantic
correspondence between the two modalities is not well learned. In this paper,
we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of
the same person from two modalities are split into patches and stitched into a
new one for model learning. In this way, the modellearns to recognize a person
through patches of different styles, and the modality semantic correspondence
is directly embodied. With the flexible image generation strategy, the
patch-mixed images freely adjust the ratio of different modality patches, which
could further alleviate the modality imbalance problem. In addition, the
relationship between identity centers among modalities is explored to further
reduce the modality variance, and the global-to-part constraint is introduced
to regularize representation learning of part features. On two VI-ReID
datasets, we report new state-of-the-art performance with the proposed method.Comment: IJCAI2
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
Due to the modality gap between visible and infrared images with high visual
ambiguity, learning \textbf{diverse} modality-shared semantic concepts for
visible-infrared person re-identification (VI-ReID) remains a challenging
problem. Body shape is one of the significant modality-shared cues for VI-ReID.
To dig more diverse modality-shared cues, we expect that erasing
body-shape-related semantic concepts in the learned features can force the ReID
model to extract more and other modality-shared features for identification. To
this end, we propose shape-erased feature learning paradigm that decorrelates
modality-shared features in two orthogonal subspaces. Jointly learning
shape-related feature in one subspace and shape-erased features in the
orthogonal complement achieves a conditional mutual information maximization
between shape-erased feature and identity discarding body shape information,
thus enhancing the diversity of the learned representation explicitly.
Extensive experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate
the effectiveness of our method.Comment: CVPR 202
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID)
aims at learning modality-invariant features from unlabeled cross-modality
dataset, which is crucial for practical applications in video surveillance
systems. The key to essentially address the USL-VI-ReID task is to solve the
cross-modality data association problem for further heterogeneous joint
learning. To address this issue, we propose a Dual Optimal Transport Label
Assignment (DOTLA) framework to simultaneously assign the generated labels from
one modality to its counterpart modality. The proposed DOTLA mechanism
formulates a mutual reinforcement and efficient solution to cross-modality data
association, which could effectively reduce the side-effects of some
insufficient and noisy label associations. Besides, we further propose a
cross-modality neighbor consistency guided label refinement and regularization
module, to eliminate the negative effects brought by the inaccurate supervised
signals, under the assumption that the prediction or label distribution of each
example should be similar to its nearest neighbors. Extensive experimental
results on the public SYSU-MM01 and RegDB datasets demonstrate the
effectiveness of the proposed method, surpassing existing state-of-the-art
approach by a large margin of 7.76% mAP on average, which even surpasses some
supervised VI-ReID methods
VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for Single Modality Labeled Visible-Infrared Person Re-identification
Visible-Infrared person re-identification (VI-ReID) in real-world scenarios
poses a significant challenge due to the high cost of cross-modality data
annotation. Different sensing cameras, such as RGB/IR cameras for good/poor
lighting conditions, make it costly and error-prone to identify the same person
across modalities. To overcome this, we explore the use of single-modality
labeled data for the VI-ReID task, which is more cost-effective and practical.
By labeling pedestrians in only one modality (e.g., visible images) and
retrieving in another modality (e.g., infrared images), we aim to create a
training set containing both originally labeled and modality-translated data
using unpaired image-to-image translation techniques. In this paper, we propose
VI-Diff, a diffusion model that effectively addresses the task of
Visible-Infrared person image translation. Through comprehensive experiments,
we demonstrate that VI-Diff outperforms existing diffusion and GAN models,
making it a promising solution for VI-ReID with single-modality labeled data.
Our approach can be a promising solution to the VI-ReID task with
single-modality labeled data and serves as a good starting point for future
study. Code will be available.Comment: 11 pages, 7 figure
- …