81,393 research outputs found
Learnable PINs: Cross-Modal Embeddings for Person Identity
We propose and investigate an identity sensitive joint embedding of face and
voice. Such an embedding enables cross-modal retrieval from voice to face and
from face to voice. We make the following four contributions: first, we show
that the embedding can be learnt from videos of talking faces, without
requiring any identity labels, using a form of cross-modal self-supervision;
second, we develop a curriculum learning schedule for hard negative mining
targeted to this task, that is essential for learning to proceed successfully;
third, we demonstrate and evaluate cross-modal retrieval for identities unseen
and unheard during training over a number of scenarios and establish a
benchmark for this novel task; finally, we show an application of using the
joint embedding for automatically retrieving and labelling characters in TV
dramas.Comment: To appear in ECCV 201
AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes
Multi-object tracking (MOT) is a fundamental problem in computer vision with
numerous applications, such as intelligent surveillance and automated driving.
Despite the significant progress made in MOT, pedestrian attributes, such as
gender, hairstyle, body shape, and clothing features, which contain rich and
high-level information, have been less explored. To address this gap, we
propose a simple, effective, and generic method to predict pedestrian
attributes to support general Re-ID embedding. We first introduce AttMOT, a
large, highly enriched synthetic dataset for pedestrian tracking, containing
over 80k frames and 6 million pedestrian IDs with different time, weather
conditions, and scenarios. To the best of our knowledge, AttMOT is the first
MOT dataset with semantic attributes. Subsequently, we explore different
approaches to fuse Re-ID embedding and pedestrian attributes, including
attention mechanisms, which we hope will stimulate the development of
attribute-assisted MOT. The proposed method AAM demonstrates its effectiveness
and generality on several representative pedestrian multi-object tracking
benchmarks, including MOT17 and MOT20, through experiments on the AttMOT
dataset. When applied to state-of-the-art trackers, AAM achieves consistent
improvements in MOTA, HOTA, AssA, IDs, and IDF1 scores. For instance, on MOT17,
the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement
when used with FairMOT. To encourage further research on attribute-assisted
MOT, we will release the AttMOT dataset
- …