Visual object tracking is an essential capability of intelligent robots. Most
existing approaches have ignored the online latency that can cause severe
performance degradation during real-world processing. Especially for unmanned
aerial vehicle, where robust tracking is more challenging and onboard
computation is limited, latency issue could be fatal. In this work, we present
a simple framework for end-to-end latency-aware tracking, i.e., end-to-end
predictive visual tracking (PVT++). PVT++ is capable of turning most
leading-edge trackers into predictive trackers by appending an online
predictor. Unlike existing solutions that use model-based approaches, our
framework is learnable, such that it can take not only motion information as
input but it can also take advantage of visual cues or a combination of both.
Moreover, since PVT++ is end-to-end optimizable, it can further boost the
latency-aware tracking performance by joint training. Additionally, this work
presents an extended latency-aware evaluation benchmark for assessing an
any-speed tracker in the online setting. Empirical results on robotic platform
from aerial perspective show that PVT++ can achieve up to 60% performance gain
on various trackers and exhibit better robustness than prior model-based
solution, largely mitigating the degradation brought by latency. Code and
models will be made public.Comment: 18 pages, 10 figure