26,039 research outputs found
Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
Online Multi-Object Tracking (MOT) from videos is a challenging computer
vision task which has been extensively studied for decades. Most of the
existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm
combined with popular machine learning approaches which largely reduce the
human effort to tune algorithm parameters. However, the commonly used
supervised learning approaches require the labeled data (e.g., bounding boxes),
which is expensive for videos. Also, the TBD framework is usually suboptimal
since it is not end-to-end, i.e., it considers the task as detection and
tracking, but not jointly. To achieve both label-free and end-to-end learning
of MOT, we propose a Tracking-by-Animation framework, where a differentiable
neural model first tracks objects from input frames and then animates these
objects into reconstructed frames. Learning is then driven by the
reconstruction error through backpropagation. We further propose a
Reprioritized Attentive Tracking to improve the robustness of data association.
Experiments conducted on both synthetic and real video datasets show the
potential of the proposed model. Our project page is publicly available at:
https://github.com/zhen-he/tracking-by-animationComment: CVPR 201
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
- …