2 research outputs found
Learning a Robust Society of Tracking Parts using Co-occurrence Constraints
Object tracking is an essential problem in computer vision that has been
researched for several decades. One of the main challenges in tracking is to
adapt to object appearance changes over time and avoiding drifting to
background clutter. We address this challenge by proposing a deep neural
network composed of different parts, which functions as a society of tracking
parts. They work in conjunction according to a certain policy and learn from
each other in a robust manner, using co-occurrence constraints that ensure
robust inference and learning. From a structural point of view, our network is
composed of two main pathways. One pathway is more conservative. It carefully
monitors a large set of simple tracker parts learned as linear filters over
deep feature activation maps. It assigns the parts different roles. It promotes
the reliable ones and removes the inconsistent ones. We learn these filters
simultaneously in an efficient way, with a single closed-form formulation, for
which we propose novel theoretical properties. The second pathway is more
progressive. It is learned completely online and thus it is able to better
model object appearance changes. In order to adapt in a robust manner, it is
learned only on highly confident frames, which are decided using co-occurrences
with the first pathway. Thus, our system has the full benefit of two main
approaches in tracking. The larger set of simpler filter parts offers
robustness, while the full deep network learned online provides adaptability to
change. As shown in the experimental section, our approach achieves state of
the art performance on the challenging VOT17 benchmark, outperforming the
published methods both on the general EAO metric and in the number of fails, by
a significant margin.Comment: 17+3 pages, 5 figures, European Conference on Computer Vision (ECCV),
Visual Object Tracking worksho
SFTrack++: A Fast Learnable Spectral Segmentation Approach for Space-Time Consistent Tracking
We propose an object tracking method, SFTrack++, that smoothly learns to
preserve the tracked object consistency over space and time dimensions by
taking a spectral clustering approach over the graph of pixels from the video,
using a fast 3D filtering formulation for finding the principal eigenvector of
this graph's adjacency matrix. To better capture complex aspects of the tracked
object, we enrich our formulation to multi-channel inputs, which permit
different points of view for the same input. The channel inputs are in our
experiments, the output of multiple tracking methods. After combining them,
instead of relying only on hidden layers representations to predict a good
tracking bounding box, we explicitly learn an intermediate, more refined one,
namely the segmentation map of the tracked object. This prevents the rough
common bounding box approach to introduce noise and distractors in the learning
process. We test our method, SFTrack++, on five tracking benchmarks: OTB, UAV,
NFS, GOT-10k, and TrackingNet, using five top trackers as input. Our
experimental results validate the pre-registered hypothesis. We obtain
consistent and robust results, competitive on the three traditional benchmarks
(OTB, UAV, NFS) and significantly on top of others (by over on
accuracy) on GOT-10k and TrackingNet, which are newer, larger, and more varied
datasets.Comment: Accepted at Neural Information Processing Systems (NeurIPS) 2020 -
Pre-registration Workshop and at The International Conference on Computer
Vision (ICCV) 2021 - Structured Representations for Video Understanding
Worksho