28,337 research outputs found
Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification
An efficient and effective person re-identification (ReID) system relieves
the users from painful and boring video watching and accelerates the process of
video analysis. Recently, with the explosive demands of practical applications,
a lot of research efforts have been dedicated to heterogeneous person
re-identification (Hetero-ReID). In this paper, we provide a comprehensive
review of state-of-the-art Hetero-ReID methods that address the challenge of
inter-modality discrepancies. According to the application scenario, we
classify the methods into four categories -- low-resolution, infrared, sketch,
and text. We begin with an introduction of ReID, and make a comparison between
Homogeneous ReID (Homo-ReID) and Hetero-ReID tasks. Then, we describe and
compare existing datasets for performing evaluations, and survey the models
that have been widely employed in Hetero-ReID. We also summarize and compare
the representative approaches from two perspectives, i.e., the application
scenario and the learning pipeline. We conclude by a discussion of some future
research directions. Follow-up updates are avaible at:
https://github.com/lightChaserX/Awesome-Hetero-reIDComment: Accepted by IJCAI 2020. Project url:
https://github.com/lightChaserX/Awesome-Hetero-reI
Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification
Re-identification is generally carried out by encoding the appearance of a
subject in terms of outfit, suggesting scenarios where people do not change
their attire. In this paper we overcome this restriction, by proposing a
framework based on a deep convolutional neural network, SOMAnet, that
additionally models other discriminative aspects, namely, structural attributes
of the human figure (e.g. height, obesity, gender). Our method is unique in
many respects. First, SOMAnet is based on the Inception architecture, departing
from the usual siamese framework. This spares expensive data preparation
(pairing images across cameras) and allows the understanding of what the
network learned. Second, and most notably, the training data consists of a
synthetic 100K instance dataset, SOMAset, created by photorealistic human body
generation software. Synthetic data represents a good compromise between
realistic imagery, usually not required in re-identification since surveillance
cameras capture low-resolution silhouettes, and complete control of the
samples, which is useful in order to customize the data w.r.t. the surveillance
scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on
recent re-identification benchmarks, outperforms all competitors, matching
subjects even with different apparel. The combination of synthetic data with
Inception architectures opens up new research avenues in re-identification.Comment: 14 page
Learnable PINs: Cross-Modal Embeddings for Person Identity
We propose and investigate an identity sensitive joint embedding of face and
voice. Such an embedding enables cross-modal retrieval from voice to face and
from face to voice. We make the following four contributions: first, we show
that the embedding can be learnt from videos of talking faces, without
requiring any identity labels, using a form of cross-modal self-supervision;
second, we develop a curriculum learning schedule for hard negative mining
targeted to this task, that is essential for learning to proceed successfully;
third, we demonstrate and evaluate cross-modal retrieval for identities unseen
and unheard during training over a number of scenarios and establish a
benchmark for this novel task; finally, we show an application of using the
joint embedding for automatically retrieving and labelling characters in TV
dramas.Comment: To appear in ECCV 201
Shape-centered Representation Learning for Visible-Infrared Person Re-identification
Current Visible-Infrared Person Re-Identification (VI-ReID) methods
prioritize extracting distinguishing appearance features, ignoring the natural
resistance of body shape against modality changes. Initially, we gauged the
discriminative potential of shapes by a straightforward concatenation of shape
and appearance features. However, two unresolved issues persist in the
utilization of shape features. One pertains to the dependence on auxiliary
models for shape feature extraction in the inference phase, along with the
errors in generated infrared shapes due to the intrinsic modality disparity.
The other issue involves the inadequately explored correlation between shape
and appearance features. To tackle the aforementioned challenges, we propose
the Shape-centered Representation Learning framework (ScRL), which focuses on
learning shape features and appearance features associated with shapes.
Specifically, we devise the Shape Feature Propagation (SFP), facilitating
direct extraction of shape features from original images with minimal
complexity costs during inference. To restitute inaccuracies in infrared body
shapes at the feature level, we present the Infrared Shape Restitution (ISR).
Furthermore, to acquire appearance features related to shape, we design the
Appearance Feature Enhancement (AFE), which accentuates identity-related
features while suppressing identity-unrelated features guided by shape
features. Extensive experiments are conducted to validate the effectiveness of
the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy
attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM,
RegDB datasets respectively, outperforming existing state-of-the-art methods
- …