911 research outputs found
Hard Negative Samples Emphasis Tracker without Anchors
Trackers based on Siamese network have shown tremendous success, because of
their balance between accuracy and speed. Nevertheless, with tracking scenarios
becoming more and more sophisticated, most existing Siamese-based approaches
ignore the addressing of the problem that distinguishes the tracking target
from hard negative samples in the tracking phase. The features learned by these
networks lack of discrimination, which significantly weakens the robustness of
Siamese-based trackers and leads to suboptimal performance. To address this
issue, we propose a simple yet efficient hard negative samples emphasis method,
which constrains Siamese network to learn features that are aware of hard
negative samples and enhance the discrimination of embedding features. Through
a distance constraint, we force to shorten the distance between exemplar vector
and positive vectors, meanwhile, enlarge the distance between exemplar vector
and hard negative vectors. Furthermore, we explore a novel anchor-free tracking
framework in a per-pixel prediction fashion, which can significantly reduce the
number of hyper-parameters and simplify the tracking process by taking full
advantage of the representation of convolutional neural network. Extensive
experiments on six standard benchmark datasets demonstrate that the proposed
method can perform favorable results against state-of-the-art approaches.Comment: accepted by ACM Mutlimedia Conference, 202
Deformable Siamese Attention Networks for Visual Object Tracking
Siamese-based trackers have achieved excellent performance on visual object
tracking. However, the target template is not updated online, and the features
of the target template and search image are computed independently in a Siamese
architecture. In this paper, we propose Deformable Siamese Attention Networks,
referred to as SiamAttn, by introducing a new Siamese attention mechanism that
computes deformable self-attention and cross-attention. The self attention
learns strong context information via spatial attention, and selectively
emphasizes interdependent channel-wise features with channel attention. The
cross-attention is capable of aggregating rich contextual inter-dependencies
between the target template and the search image, providing an implicit manner
to adaptively update the target template. In addition, we design a region
refinement module that computes depth-wise cross correlations between the
attentional features for more accurate tracking. We conduct experiments on six
benchmarks, where our method achieves new state of-the-art results,
outperforming the strong baseline, SiamRPN++ [24], by 0.464->0.537 and
0.415->0.470 EAO on VOT 2016 and 2018. Our code is available at:
https://github.com/msight-tech/research-siamattn.Comment: CVPR 2020, with code available at:
https://github.com/msight-tech/research-siamatt
- …