6 research outputs found
Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification
RGB-Infrared (IR) person re-identification is very challenging due to the
large cross-modality variations between RGB and IR images. The key solution is
to learn aligned features to the bridge RGB and IR modalities. However, due to
the lack of correspondence labels between every pair of RGB and IR images, most
methods try to alleviate the variations with set-level alignment by reducing
the distance between the entire RGB and IR sets. However, this set-level
alignment may lead to misalignment of some instances, which limits the
performance for RGB-IR Re-ID. Different from existing methods, in this paper,
we propose to generate cross-modality paired-images and perform both global
set-level and fine-grained instance-level alignments. Our proposed method
enjoys several merits. First, our method can perform set-level alignment by
disentangling modality-specific and modality-invariant features. Compared with
conventional methods, ours can explicitly remove the modality-specific features
and the modality variation can be better reduced. Second, given cross-modality
unpaired-images of a person, our method can generate cross-modality paired
images from exchanged images. With them, we can directly perform instance-level
alignment by minimizing distances of every pair of images. Extensive
experimental results on two standard benchmarks demonstrate that the proposed
model favourably against state-of-the-art methods. Especially, on SYSU-MM01
dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and
mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.Comment: accepted by AAAI'2
Faster Person Re-Identification
Fast person re-identification (ReID) aims to search person images quickly and
accurately. The main idea of recent fast ReID methods is the hashing algorithm,
which learns compact binary codes and performs fast Hamming distance and
counting sort. However, a very long code is needed for high accuracy (e.g.
2048), which compromises search speed. In this work, we introduce a new
solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code
search strategy, which complementarily uses short and long codes, achieving
both faster speed and better accuracy. It uses shorter codes to coarsely rank
broad matching similarities and longer codes to refine only a few top
candidates for more accurate instance ReID. Specifically, we design an
All-in-One (AiO) framework together with a Distance Threshold Optimization
(DTO) algorithm. In AiO, we simultaneously learn and enhance multiple codes of
different lengths in a single model. It learns multiple codes in a pyramid
structure, and encourage shorter codes to mimic longer codes by
self-distillation. DTO solves a complex threshold search problem by a simple
optimization process, and the balance between accuracy and speed is easily
controlled by a single parameter. It formulates the optimization target as a
score that can be optimised by Gaussian cumulative distribution
functions. Experimental results on 2 datasets show that our proposed method
(CtF) is not only 8% more accurate but also 5x faster than contemporary hashing
ReID methods. Compared with non-hashing ReID methods, CtF is faster
with comparable accuracy. Code is available at
https://github.com/wangguanan/light-reid.Comment: accepted by ECCV2020, https://github.com/wangguanan/light-rei
Domain Adaptive Person Search via GAN-based Scene Synthesis for Cross-scene Videos
Person search has recently been a challenging task in the computer vision
domain, which aims to search specific pedestrians from real
cameras.Nevertheless, most surveillance videos comprise only a handful of
images of each pedestrian, which often feature identical backgrounds and
clothing. Hence, it is difficult to learn more discriminative features for
person search in real scenes. To tackle this challenge, we draw on Generative
Adversarial Networks (GAN) to synthesize data from surveillance videos. GAN has
thrived in computer vision problems because it produces high-quality images
efficiently. We merely alter the popular Fast R-CNN model, which is capable of
processing videos and yielding accurate detection outcomes. In order to
appropriately relieve the pressure brought by the two-stage model, we design an
Assisted-Identity Query Module (AIDQ) to provide positive images for the behind
part. Besides, the proposed novel GAN-based Scene Synthesis model that can
synthesize high-quality cross-id person images for person search tasks. In
order to facilitate the feature learning of the GAN-based Scene Synthesis
model, we adopt an online learning strategy that collaboratively learns the
synthesized images and original images. Extensive experiments on two widely
used person search benchmarks, CUHK-SYSU and PRW, have shown that our method
has achieved great performance, and the extensive ablation study further
justifies our GAN-synthetic data can effectively increase the variability of
the datasets and be more realistic