4,603 research outputs found
An efficient framework for visible-infrared cross modality person re-identification
Visible-infrared cross-modality person re-identification (VI-ReId) is an essential task for video surveillance in poorly illuminated or dark environments. Despite many recent studies on person re-identification in the visible domain (ReId), there are few studies dealing specifically with VI-ReId. Besides challenges that are common for both ReId and VI-ReId such as pose/illumination variations, background clutter and occlusion, VI-ReId has additional challenges as color information is not available in infrared images. As a result, the performance of VI-ReId systems is typically lower than that of ReId systems. In this work, we propose a four-stream framework to improve VI-ReId performance. We train a separate deep convolutional neural network in each stream using different representations of input images. We expect that different and complementary features can be learned from each stream. In our framework, grayscale and infrared input images are used to train the ResNet in the first stream. In the second stream, RGB and three-channel infrared images (created by repeating the infrared channel) are used. In the remaining two streams, we use local pattern maps as input images. These maps are generated utilizing local Zernike moments transformation. Local pattern maps are obtained from grayscale and infrared images in the third stream and from RGB and three-channel infrared images in the last stream. We improve the performance of the proposed framework by employing a re-ranking algorithm for post-processing. Our results indicate that the proposed framework outperforms current state-of-the-art with a large margin by improving Rank-1/mAP by 29.79%/30.91% on SYSU-MM01 dataset, and by 9.73%/16.36% on RegDB dataset.WOS:000551127300017Scopus - Affiliation ID: 60105072Science Citation Index ExpandedQ2ArticleUluslararası işbirliği ile yapılmayan - HAYIREylül2020YÖK - 2020-2
Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification
An efficient and effective person re-identification (ReID) system relieves
the users from painful and boring video watching and accelerates the process of
video analysis. Recently, with the explosive demands of practical applications,
a lot of research efforts have been dedicated to heterogeneous person
re-identification (Hetero-ReID). In this paper, we provide a comprehensive
review of state-of-the-art Hetero-ReID methods that address the challenge of
inter-modality discrepancies. According to the application scenario, we
classify the methods into four categories -- low-resolution, infrared, sketch,
and text. We begin with an introduction of ReID, and make a comparison between
Homogeneous ReID (Homo-ReID) and Hetero-ReID tasks. Then, we describe and
compare existing datasets for performing evaluations, and survey the models
that have been widely employed in Hetero-ReID. We also summarize and compare
the representative approaches from two perspectives, i.e., the application
scenario and the learning pipeline. We conclude by a discussion of some future
research directions. Follow-up updates are avaible at:
https://github.com/lightChaserX/Awesome-Hetero-reIDComment: Accepted by IJCAI 2020. Project url:
https://github.com/lightChaserX/Awesome-Hetero-reI
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID
Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to
match pedestrian images of the same identity from different modalities without
annotations. Existing works mainly focus on alleviating the modality gap by
aligning instance-level features of the unlabeled samples. However, the
relationships between cross-modality clusters are not well explored. To this
end, we propose a novel bilateral cluster matching-based learning framework to
reduce the modality gap by matching cross-modality clusters. Specifically, we
design a Many-to-many Bilateral Cross-Modality Cluster Matching (MBCCM)
algorithm through optimizing the maximum matching problem in a bipartite graph.
Then, the matched pairwise clusters utilize shared visible and infrared
pseudo-labels during the model training. Under such a supervisory signal, a
Modality-Specific and Modality-Agnostic (MSMA) contrastive learning framework
is proposed to align features jointly at a cluster-level. Meanwhile, the
cross-modality Consistency Constraint (CC) is proposed to explicitly reduce the
large modality discrepancy. Extensive experiments on the public SYSU-MM01 and
RegDB datasets demonstrate the effectiveness of the proposed method, surpassing
state-of-the-art approaches by a large margin of 8.76% mAP on average
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID)
aims at learning modality-invariant features from unlabeled cross-modality
dataset, which is crucial for practical applications in video surveillance
systems. The key to essentially address the USL-VI-ReID task is to solve the
cross-modality data association problem for further heterogeneous joint
learning. To address this issue, we propose a Dual Optimal Transport Label
Assignment (DOTLA) framework to simultaneously assign the generated labels from
one modality to its counterpart modality. The proposed DOTLA mechanism
formulates a mutual reinforcement and efficient solution to cross-modality data
association, which could effectively reduce the side-effects of some
insufficient and noisy label associations. Besides, we further propose a
cross-modality neighbor consistency guided label refinement and regularization
module, to eliminate the negative effects brought by the inaccurate supervised
signals, under the assumption that the prediction or label distribution of each
example should be similar to its nearest neighbors. Extensive experimental
results on the public SYSU-MM01 and RegDB datasets demonstrate the
effectiveness of the proposed method, surpassing existing state-of-the-art
approach by a large margin of 7.76% mAP on average, which even surpasses some
supervised VI-ReID methods
Deep Perceptual Mapping for Thermal to Visible Face Recognition
Cross modal face matching between the thermal and visible spectrum is a much
de- sired capability for night-time surveillance and security applications. Due
to a very large modality gap, thermal-to-visible face recognition is one of the
most challenging face matching problem. In this paper, we present an approach
to bridge this modality gap by a significant margin. Our approach captures the
highly non-linear relationship be- tween the two modalities by using a deep
neural network. Our model attempts to learn a non-linear mapping from visible
to thermal spectrum while preserving the identity in- formation. We show
substantive performance improvement on a difficult thermal-visible face
dataset. The presented approach improves the state-of-the-art by more than 10%
in terms of Rank-1 identification and bridge the drop in performance due to the
modality gap by more than 40%.Comment: BMVC 2015 (oral
Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification
Visible-Infrared person re-identification (VI-ReID) is an important and
challenging task in intelligent video surveillance. Existing methods mainly
focus on learning a shared feature space to reduce the modality discrepancy
between visible and infrared modalities, which still leave two problems
underexplored: information redundancy and modality complementarity. To this
end, properly eliminating the identity-irrelevant information as well as making
up for the modality-specific information are critical and remains a challenging
endeavor. To tackle the above problems, we present a novel mutual information
and modality consensus network, namely CMInfoNet, to extract modality-invariant
identity features with the most representative information and reduce the
redundancies. The key insight of our method is to find an optimal
representation to capture more identity-relevant information and compress the
irrelevant parts by optimizing a mutual information bottleneck trade-off.
Besides, we propose an automatically search strategy to find the most prominent
parts that identify the pedestrians. To eliminate the cross- and intra-modality
variations, we also devise a modality consensus module to align the visible and
infrared modalities for task-specific guidance. Moreover, the global-local
feature representations can also be acquired for key parts discrimination.
Experimental results on four benchmarks, i.e., SYSU-MM01, RegDB,
Occluded-DukeMTMC, Occluded-REID, Partial-REID and Partial\_iLIDS dataset, have
demonstrated the effectiveness of CMInfoNet
- …