125 research outputs found
Learning attentions: residual attentional Siamese Network for high performance online visual tracking
Offline training for object tracking has recently shown
great potentials in balancing tracking accuracy and speed.
However, it is still difficult to adapt an offline trained model
to a target tracked online. This work presents a Residual Attentional
Siamese Network (RASNet) for high performance
object tracking. The RASNet model reformulates the correlation
filter within a Siamese tracking framework, and introduces
different kinds of the attention mechanisms to adapt
the model without updating the model online. In particular,
by exploiting the offline trained general attention, the target
adapted residual attention, and the channel favored feature
attention, the RASNet not only mitigates the over-fitting
problem in deep network training, but also enhances its discriminative
capacity and adaptability due to the separation
of representation learning and discriminator learning. The
proposed deep architecture is trained from end to end and
takes full advantage of the rich spatial temporal information
to achieve robust visual tracking. Experimental results
on two latest benchmarks, OTB-2015 and VOT2017, show
that the RASNet tracker has the state-of-the-art tracking accuracy
while runs at more than 80 frames per second
Do not lose the details: reinforced representation learning for high performance visual tracking
This work presents a novel end-to-end trainable CNN model for high performance visual object tracking. It learns both low-level fine-grained representations and a high-level semantic embedding space in a mutual reinforced way, and a multi-task
learning strategy is proposed to perform the correlation
analysis on representations from both levels. In particular, a fully convolutional encoder-decoder network is designed to reconstruct the original visual features from the semantic projections to preserve all the geometric information. Moreover,
the correlation filter layer working on the fine-grained
representations leverages a global context constraint for accurate object appearance modeling. The correlation filter in this layer is updated online efficiently without network fine-tuning. Therefore, the proposed tracker benefits from two complementary effects: the adaptability of the fine-grained correlation analysis and the generalization capability
of the semantic embedding. Extensive experimental
evaluations on four popular benchmarks demonstrate
its state-of-the-art performance
- …