175 research outputs found
Long-term Tracking in the Wild: A Benchmark
We introduce the OxUvA dataset and benchmark for evaluating single-object
tracking algorithms. Benchmarks have enabled great strides in the field of
object tracking by defining standardized evaluations on large sets of diverse
videos. However, these works have focused exclusively on sequences that are
just tens of seconds in length and in which the target is always visible.
Consequently, most researchers have designed methods tailored to this
"short-term" scenario, which is poorly representative of practitioners' needs.
Aiming to address this disparity, we compile a long-term, large-scale tracking
dataset of sequences with average length greater than two minutes and with
frequent target object disappearance. The OxUvA dataset is much larger than the
object tracking datasets of recent years: it comprises 366 sequences spanning
14 hours of video. We assess the performance of several algorithms, considering
both the ability to locate the target and to determine whether it is present or
absent. Our goal is to offer the community a large and diverse benchmark to
enable the design and evaluation of tracking methods ready to be used "in the
wild". The project website is http://oxuva.netComment: To appear at ECCV 201
Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers
This paper improves state-of-the-art visual object trackers that use online
adaptation. Our core contribution is an offline meta-learning-based method to
adjust the initial deep networks used in online adaptation-based tracking. The
meta learning is driven by the goal of deep networks that can quickly be
adapted to robustly model a particular target in future frames. Ideally the
resulting models focus on features that are useful for future frames, and avoid
overfitting to background clutter, small parts of the target, or noise. By
enforcing a small number of update iterations during meta-learning, the
resulting networks train significantly faster. We demonstrate this approach on
top of the high performance tracking approaches: tracking-by-detection based
MDNet and the correlation based CREST. Experimental results on standard
benchmarks, OTB2015 and VOT2016, show that our meta-learned versions of both
trackers improve speed, accuracy, and robustness.Comment: Code: https://github.com/silverbottlep/meta_tracker
Non-Causal Tracking by Deblatting
Tracking by Deblatting stands for solving an inverse problem of deblurring
and image matting for tracking motion-blurred objects. We propose non-causal
Tracking by Deblatting which estimates continuous, complete and accurate object
trajectories. Energy minimization by dynamic programming is used to detect
abrupt changes of motion, called bounces. High-order polynomials are fitted to
segments, which are parts of the trajectory separated by bounces. The output is
a continuous trajectory function which assigns location for every real-valued
time stamp from zero to the number of frames. Additionally, we show that from
the trajectory function precise physical calculations are possible, such as
radius, gravity or sub-frame object velocity. Velocity estimation is compared
to the high-speed camera measurements and radars. Results show high performance
of the proposed method in terms of Trajectory-IoU, recall and velocity
estimation.Comment: Published at GCPR 2019, oral presentation, Best Paper Honorable
Mention Awar
Learning Rotation Adaptive Correlation Filters in Robust Visual Object Tracking
Visual object tracking is one of the major challenges in the field of
computer vision. Correlation Filter (CF) trackers are one of the most widely
used categories in tracking. Though numerous tracking algorithms based on CFs
are available today, most of them fail to efficiently detect the object in an
unconstrained environment with dynamically changing object appearance. In order
to tackle such challenges, the existing strategies often rely on a particular
set of algorithms. Here, we propose a robust framework that offers the
provision to incorporate illumination and rotation invariance in the standard
Discriminative Correlation Filter (DCF) formulation. We also supervise the
detection stage of DCF trackers by eliminating false positives in the
convolution response map. Further, we demonstrate the impact of displacement
consistency on CF trackers. The generality and efficiency of the proposed
framework is illustrated by integrating our contributions into two
state-of-the-art CF trackers: SRDCF and ECO. As per the comprehensive
experiments on the VOT2016 dataset, our top trackers show substantial
improvement of 14.7% and 6.41% in robustness, 11.4% and 1.71% in Average
Expected Overlap (AEO) over the baseline SRDCF and ECO, respectively.Comment: Published in ACCV 201
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
Determining Interacting Objects in Human-Centric Activities via Qualitative Spatio-Temporal Reasoning
Abstract. Understanding the activities taking place in a video is a chal-lenging problem in Artificial Intelligence. Complex video sequences con-tain many activities and involve a multitude of interacting objects. De-termining which objects are relevant to a particular activity is the first step in understanding the activity. Indeed many objects in the scene are irrelevant to the main activity taking place. In this work, we consider human-centric activities and look to identify which objects in the scene are involved in the activity. We take an activity-agnostic approach and rank every moving object in the scene with how likely it is to be involved in the activity. We use a comprehensive spatio-temporal representation that captures the joint movement between humans and each object. We then use supervised machine learning techniques to recognize relevant objects based on these features. Our approach is tested on the challeng-ing Mind’s Eye dataset.
Online, Real-Time Tracking Using a Category-to-Individual Detector
A method for online, real-time tracking of objects is presented. Tracking is treated as a repeated detection problem where potential target objects are identified with a pre-trained category detector and object identity across frames is established by individual-specific detectors. The individual detectors are (re-)trained online from a single
positive example whenever there is a coincident category detection. This ensures that the tracker is robust to drift. Real-time operation is possible since an individual-object detector is obtained through elementary manipulations of the thresholds of the category detector and therefore only minimal additional computations are required. Our tracking algorithm is benchmarked against nine state-of-the-art trackers on two large, publicly available and challenging video datasets. We find that our algorithm is 10% more accurate and nearly as fast as the fastest of the competing algorithms, and it is as accurate but 20 times faster than the most accurate of the competing algorithms
Occlusion and Motion Reasoning for Long-Term Tracking
International audienceObject tracking is a reoccurring problem in computer vision. Tracking-by-detection approaches, in particular Struck (Hare et al., 2011), have shown to be competitive in recent evaluations. However, such approaches fail in the presence of long-term occlusions as well as severe viewpoint changes of the object. In this paper we propose a principled way to combine occlusion and motion reasoning with a tracking-by-detection approach. Occlusion and motion reasoning is based on state-of-the-art long-term trajectories which are labeled as object or background tracks with an energy-based formulation. The overlap between labeled tracks and detected regions allows to identify occlusions. The motion changes of the object between consecutive frames can be estimated robustly from the geometric relation between object trajectories. If this geometric change is significant, an additional detector is trained. Experimental results show that our tracker obtains state-of-the-art results and handles occlusion and viewpoints changes better than competing tracking methods
- …