58,885 research outputs found
Learning to Divide and Conquer for Online Multi-Target Tracking
Online Multiple Target Tracking (MTT) is often addressed within the
tracking-by-detection paradigm. Detections are previously extracted
independently in each frame and then objects trajectories are built by
maximizing specifically designed coherence functions. Nevertheless, ambiguities
arise in presence of occlusions or detection errors. In this paper we claim
that the ambiguities in tracking could be solved by a selective use of the
features, by working with more reliable features if possible and exploiting a
deeper representation of the target only if necessary. To this end, we propose
an online divide and conquer tracker for static camera scenes, which partitions
the assignment problem in local subproblems and solves them by selectively
choosing and combining the best features. The complete framework is cast as a
structural learning task that unifies these phases and learns tracker
parameters from examples. Experiments on two different datasets highlights a
significant improvement of tracking performances (MOTA +10%) over the state of
the art
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
- …