7,589 research outputs found
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
Evaluation of trackers for Pan-Tilt-Zoom Scenarios
Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in
computer vision for many years. Compared to tracking with a still camera, the
images captured with a PTZ camera are highly dynamic in nature because the
camera can perform large motion resulting in quickly changing capture
conditions. Furthermore, tracking with a PTZ camera involves camera control to
position the camera on the target. For successful tracking and camera control,
the tracker must be fast enough, or has to be able to predict accurately the
next position of the target. Therefore, standard benchmarks do not allow to
assess properly the quality of a tracker for the PTZ scenario. In this work, we
use a virtual PTZ framework to evaluate different tracking algorithms and
compare their performances. We also extend the framework to add target position
prediction for the next frame, accounting for camera motion and processing
delays. By doing this, we can assess if predicting can make long-term tracking
more robust as it may help slower algorithms for keeping the target in the
field of view of the camera. Results confirm that both speed and robustness are
required for tracking under the PTZ scenario.Comment: 6 pages, 2 figures, International Conference on Pattern Recognition
and Artificial Intelligence 201
Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers
Online Multi-Object Tracking (MOT) from videos is a challenging computer
vision task which has been extensively studied for decades. Most of the
existing MOT algorithms are based on the Tracking-by-Detection (TBD) paradigm
combined with popular machine learning approaches which largely reduce the
human effort to tune algorithm parameters. However, the commonly used
supervised learning approaches require the labeled data (e.g., bounding boxes),
which is expensive for videos. Also, the TBD framework is usually suboptimal
since it is not end-to-end, i.e., it considers the task as detection and
tracking, but not jointly. To achieve both label-free and end-to-end learning
of MOT, we propose a Tracking-by-Animation framework, where a differentiable
neural model first tracks objects from input frames and then animates these
objects into reconstructed frames. Learning is then driven by the
reconstruction error through backpropagation. We further propose a
Reprioritized Attentive Tracking to improve the robustness of data association.
Experiments conducted on both synthetic and real video datasets show the
potential of the proposed model. Our project page is publicly available at:
https://github.com/zhen-he/tracking-by-animationComment: CVPR 201
- …