Video data and algorithms have been driving advances in multi-object tracking
(MOT). While existing MOT datasets focus on occlusion and appearance
similarity, complex motion patterns are widespread yet overlooked. To address
this issue, we introduce a new dataset called BEE23 to highlight complex
motions. Identity association algorithms have long been the focus of MOT
research. Existing trackers can be categorized into two association paradigms:
single-feature paradigm (based on either motion or appearance feature) and
serial paradigm (one feature serves as secondary while the other is primary).
However, these paradigms are incapable of fully utilizing different features.
In this paper, we propose a parallel paradigm and present the Two rOund
Parallel matchIng meChanism (TOPIC) to implement it. The TOPIC leverages both
motion and appearance features and can adaptively select the preferable one as
the assignment metric based on motion level. Moreover, we provide an
Attention-based Appearance Reconstruct Module (AARM) to reconstruct appearance
feature embeddings, thus enhancing the representation of appearance features.
Comprehensive experiments show that our approach achieves state-of-the-art
performance on four public datasets and BEE23. Notably, our proposed parallel
paradigm surpasses the performance of existing association paradigms by a large
margin, e.g., reducing false negatives by 12% to 51% compared to the
single-feature association paradigm. The introduced dataset and association
paradigm in this work offers a fresh perspective for advancing the MOT field.
The source code and dataset are available at
https://github.com/holmescao/TOPICTrack