5 research outputs found
Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond
Unsupervised domain adaptation (UDA) for person re-identification is
challenging because of the huge gap between the source and target domain. A
typical self-training method is to use pseudo-labels generated by clustering
algorithms to iteratively optimize the model on the target domain. However, a
drawback to this is that noisy pseudo-labels generally cause trouble in
learning. To address this problem, a mutual learning method by dual networks
has been developed to produce reliable soft labels. However, as the two neural
networks gradually converge, their complementarity is weakened and they likely
become biased towards the same kind of noise. This paper proposes a novel
light-weight module, the Attentive WaveBlock (AWB), which can be integrated
into the dual networks of mutual learning to enhance the complementarity and
further depress noise in the pseudo-labels. Specifically, we first introduce a
parameter-free module, the WaveBlock, which creates a difference between
features learned by two networks by waving blocks of feature maps differently.
Then, an attention mechanism is leveraged to enlarge the difference created and
discover more complementary features. Furthermore, two kinds of combination
strategies, i.e. pre-attention and post-attention, are explored. Experiments
demonstrate that the proposed method achieves state-of-the-art performance with
significant improvements on multiple UDA person re-identification tasks. We
also prove the generality of the proposed method by applying it to vehicle
re-identification and image classification tasks. Our codes and models are
available at https://github.com/WangWenhao0716/Attentive-WaveBlock.Comment: Our codes and models are available at
https://github.com/WangWenhao0716/Attentive-WaveBloc
Cross-modality Person re-identification with Shared-Specific Feature Transfer
Cross-modality person re-identification (cm-ReID) is a challenging but key
technology for intelligent video analysis. Existing works mainly focus on
learning common representation by embedding different modalities into a same
feature space. However, only learning the common characteristics means great
information loss, lowering the upper bound of feature distinctiveness. In this
paper, we tackle the above limitation by proposing a novel cross-modality
shared-specific feature transfer algorithm (termed cm-SSFT) to explore the
potential of both the modality-shared information and the modality-specific
characteristics to boost the re-identification performance. We model the
affinities of different modality samples according to the shared features and
then transfer both shared and specific features among and across modalities. We
also propose a complementary feature learning strategy including modality
adaption, project adversarial learning and reconstruction enhancement to learn
discriminative and complementary shared and specific features of each modality,
respectively. The entire cm-SSFT algorithm can be trained in an end-to-end
manner. We conducted comprehensive experiments to validate the superiority of
the overall algorithm and the effectiveness of each component. The proposed
algorithm significantly outperforms state-of-the-arts by 22.5% and 19.3% mAP on
the two mainstream benchmark datasets SYSU-MM01 and RegDB, respectively.Comment: To appear at CVPR202
Cross-Correlated Attention Networks for Person Re-Identification
Deep neural networks need to make robust inference in the presence of
occlusion, background clutter, pose and viewpoint variations -- to name a few
-- when the task of person re-identification is considered. Attention
mechanisms have recently proven to be successful in handling the aforementioned
challenges to some degree. However previous designs fail to capture inherent
inter-dependencies between the attended features; leading to restricted
interactions between the attention blocks. In this paper, we propose a new
attention module called Cross-Correlated Attention (CCA); which aims to
overcome such limitations by maximizing the information gain between different
attended regions. Moreover, we also propose a novel deep network that makes use
of different attention mechanisms to learn robust and discriminative
representations of person images. The resulting model is called the
Cross-Correlated Attention Network (CCAN). Extensive experiments demonstrate
that the CCAN comfortably outperforms current state-of-the-art algorithms by a
tangible margin.Comment: Accepted by Image and Vision Computin
Incomplete Descriptor Mining with Elastic Loss for Person Re-Identification
In this paper, we propose a novel person Re-ID model, Consecutive Batch
DropBlock Network (CBDB-Net), to capture the attentive and robust person
descriptor for the person Re-ID task. The CBDB-Net contains two novel designs:
the Consecutive Batch DropBlock Module (CBDBM) and the Elastic Loss (EL). In
the Consecutive Batch DropBlock Module (CBDBM), we firstly conduct uniform
partition on the feature maps. And then, we independently and continuously drop
each patch from top to bottom on the feature maps, which can output multiple
incomplete feature maps. In the training stage, these multiple incomplete
features can better encourage the Re-ID model to capture the robust person
descriptor for the Re-ID task. In the Elastic Loss (EL), we design a novel
weight control item to help the Re-ID model adaptively balance hard sample
pairs and easy sample pairs in the whole training process. Through an extensive
set of ablation studies, we verify that the Consecutive Batch DropBlock Module
(CBDBM) and the Elastic Loss (EL) each contribute to the performance boosts of
CBDB-Net. We demonstrate that our CBDB-Net can achieve the competitive
performance on the three standard person Re-ID datasets (the Market-1501, the
DukeMTMC-Re-ID, and the CUHK03 dataset), three occluded Person Re-ID datasets
(the Occluded DukeMTMC, the Partial-REID, and the Partial iLIDS dataset), and a
general image retrieval dataset (In-Shop Clothes Retrieval dataset).Comment: Acceped by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Deep Learning for Person Re-identification: A Survey and Outlook
Person re-identification (Re-ID) aims at retrieving a person of interest
across multiple non-overlapping cameras. With the advancement of deep neural
networks and increasing demand of intelligent video surveillance, it has gained
significantly increased interest in the computer vision community. By
dissecting the involved components in developing a person Re-ID system, we
categorize it into the closed-world and open-world settings. The widely studied
closed-world setting is usually applied under various research-oriented
assumptions, and has achieved inspiring success using deep learning techniques
on a number of datasets. We first conduct a comprehensive overview with
in-depth analysis for closed-world person Re-ID from three different
perspectives, including deep feature representation learning, deep metric
learning and ranking optimization. With the performance saturation under
closed-world setting, the research focus for person Re-ID has recently shifted
to the open-world setting, facing more challenging issues. This setting is
closer to practical applications under specific scenarios. We summarize the
open-world Re-ID in terms of five different aspects. By analyzing the
advantages of existing methods, we design a powerful AGW baseline, achieving
state-of-the-art or at least comparable performance on twelve datasets for FOUR
different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP)
for person Re-ID, indicating the cost for finding all the correct matches,
which provides an additional criteria to evaluate the Re-ID system for real
applications. Finally, some important yet under-investigated open issues are
discussed.Comment: 20 pages, 8 figures. Accepted by IEEE TPAM