904 research outputs found
Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking
Most thermal infrared (TIR) tracking methods are discriminative, treating the
tracking problem as a classification task. However, the objective of the
classifier (label prediction) is not coupled to the objective of the tracker
(location estimation). The classification task focuses on the between-class
difference of the arbitrary objects, while the tracking task mainly deals with
the within-class difference of the same objects. In this paper, we cast the TIR
tracking problem as a similarity verification task, which is coupled well to
the objective of the tracking task. We propose a TIR tracker via a Hierarchical
Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To
obtain both spatial and semantic features of the TIR object, we design a
Siamese CNN that coalesces the multiple hierarchical convolutional layers.
Then, we propose a spatial-aware network to enhance the discriminative ability
of the coalesced hierarchical feature. Subsequently, we train this network end
to end on a large visible video detection dataset to learn the similarity
between paired objects before we transfer the network into the TIR domain.
Next, this pre-trained Siamese network is used to evaluate the similarity
between the target template and target candidates. Finally, we locate the
candidate that is most similar to the tracked target. Extensive experimental
results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed
method achieves favourable performance compared to the state-of-the-art
methods.Comment: 20 pages, 7 figure
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
This survey presents a deep analysis of the learning and inference
capabilities in nine popular trackers. It is neither intended to study the
whole literature nor is it an attempt to review all kinds of neural networks
proposed for visual tracking. We focus instead on Siamese neural networks which
are a promising starting point for studying the challenging problem of
tracking. These networks integrate efficiently feature learning and the
temporal matching and have so far shown state-of-the-art performance. In
particular, the branches of Siamese networks, their layers connecting these
branches, specific aspects of training and the embedding of these networks into
the tracker are highlighted. Quantitative results from existing papers are
compared with the conclusion that the current evaluation methodology shows
problems with the reproducibility and the comparability of results. The paper
proposes a novel Lisp-like formalism for a better comparison of trackers. This
assumes a certain functional design and functional decomposition of trackers.
The paper tries to give foundation for tracker design by a formulation of the
problem based on the theory of machine learning and by the interpretation of a
tracker as a decision function. The work concludes with promising lines of
research and suggests future work.Comment: submitted to IEEE TPAM
Siamese Attentional Keypoint Network for High Performance Visual Tracking
In this paper, we investigate the impacts of three main aspects of visual
tracking, i.e., the backbone network, the attentional mechanism, and the
detection component, and propose a Siamese Attentional Keypoint Network, dubbed
SATIN, for efficient tracking and accurate localization. Firstly, a new Siamese
lightweight hourglass network is specially designed for visual tracking. It
takes advantage of the benefits of the repeated bottom-up and top-down
inference to capture more global and local contextual information at multiple
scales. Secondly, a novel cross-attentional module is utilized to leverage both
channel-wise and spatial intermediate attentional information, which can
enhance both discriminative and localization capabilities of feature maps.
Thirdly, a keypoints detection approach is invented to trace any target object
by detecting the top-left corner point, the centroid point, and the
bottom-right corner point of its bounding box. Therefore, our SATIN tracker not
only has a strong capability to learn more effective object representations,
but also is computational and memory storage efficiency, either during the
training or testing stages. To the best of our knowledge, we are the first to
propose this approach. Without bells and whistles, experimental results
demonstrate that our approach achieves state-of-the-art performance on several
recent benchmark datasets, at a speed far exceeding 27 frames per second.Comment: Accepted by Knowledge-Based SYSTEM
End-to-end representation learning for Correlation Filter based tracking
The Correlation Filter is an algorithm that trains a linear template to
discriminate between images and their translations. It is well suited to object
tracking because its formulation in the Fourier domain provides a fast
solution, enabling the detector to be re-trained once per frame. Previous works
that use the Correlation Filter, however, have adopted features that were
either manually designed or trained for a different task. This work is the
first to overcome this limitation by interpreting the Correlation Filter
learner, which has a closed-form solution, as a differentiable layer in a deep
neural network. This enables learning deep features that are tightly coupled to
the Correlation Filter. Experiments illustrate that our method has the
important practical benefit of allowing lightweight architectures to achieve
state-of-the-art performance at high framerates.Comment: To appear at CVPR 201
Prediction-Tracking-Segmentation
We introduce a prediction driven method for visual tracking and segmentation
in videos. Instead of solely relying on matching with appearance cues for
tracking, we build a predictive model which guides finding more accurate
tracking regions efficiently. With the proposed prediction mechanism, we
improve the model robustness against distractions and occlusions during
tracking. We demonstrate significant improvements over state-of-the-art methods
not only on visual tracking tasks (VOT 2016 and VOT 2018) but also on video
segmentation datasets (DAVIS 2016 and DAVIS 2017)
SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Siamese network based trackers formulate tracking as convolutional feature
cross-correlation between target template and searching region. However,
Siamese trackers still have accuracy gap compared with state-of-the-art
algorithms and they cannot take advantage of feature from deep networks, such
as ResNet-50 or deeper. In this work we prove the core reason comes from the
lack of strict translation invariance. By comprehensive theoretical analysis
and experimental validations, we break this restriction through a simple yet
effective spatial aware sampling strategy and successfully train a
ResNet-driven Siamese tracker with significant performance gain. Moreover, we
propose a new model architecture to perform depth-wise and layer-wise
aggregations, which not only further improves the accuracy but also reduces the
model size. We conduct extensive ablation studies to demonstrate the
effectiveness of the proposed tracker, which obtains currently the best results
on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and
LaSOT. Our model will be released to facilitate further studies based on this
problem.Comment: 9 page
A Twofold Siamese Network for Real-Time Object Tracking
Observing that Semantic features learned in an image classification task and
Appearance features learned in a similarity matching task complement each
other, we build a twofold Siamese network, named SA-Siam, for real-time object
tracking. SA-Siam is composed of a semantic branch and an appearance branch.
Each branch is a similarity-learning Siamese network. An important design
choice in SA-Siam is to separately train the two branches to keep the
heterogeneity of the two types of features. In addition, we propose a channel
attention mechanism for the semantic branch. Channel-wise weights are computed
according to the channel activations around the target position. While the
inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate
beyond real-time, the twofold design and the attention mechanism significantly
improve the tracking performance. The proposed SA-Siam outperforms all other
real-time trackers by a large margin on OTB-2013/50/100 benchmarks.Comment: Accepted by CVPR'1
Visual Object Tracking based on Adaptive Siamese and Motion Estimation Network
Recently, convolutional neural network (CNN) has attracted much attention in
different areas of computer vision, due to its powerful abstract feature
representation. Visual object tracking is one of the interesting and important
areas in computer vision that achieves remarkable improvements in recent years.
In this work, we aim to improve both the motion and observation models in
visual object tracking by leveraging representation power of CNNs. To this end,
a motion estimation network (named MEN) is utilized to seek the most likely
locations of the target and prepare a further clue in addition to the previous
target position. Hence the motion estimation would be enhanced by generating a
small number of candidates near two plausible positions. The generated
candidates are then fed into a trained Siamese network to detect the most
probable candidate. Each candidate is compared to an adaptable buffer, which is
updated under a predefined condition. To take into account the target
appearance changes, a weighting CNN (called WCNN) adaptively assigns weights to
the final similarity scores of the Siamese network using sequence-specific
information. Evaluation results on well-known benchmark datasets (OTB100, OTB50
and OTB2013) prove that the proposed tracker outperforms the state-of-the-art
competitors.Comment: 28 pages, 1 algorithm, 7 figures, 2 table, Submitted to Elsevier,
Image and Vision Computin
DCFNet: Discriminant Correlation Filters Network for Visual Tracking
Discriminant Correlation Filters (DCF) based methods now become a kind of
dominant approach to online object tracking. The features used in these
methods, however, are either based on hand-crafted features like HoGs, or
convolutional features trained independently from other tasks like image
classification. In this work, we present an end-to-end lightweight network
architecture, namely DCFNet, to learn the convolutional features and perform
the correlation tracking process simultaneously. Specifically, we treat DCF as
a special correlation filter layer added in a Siamese network, and carefully
derive the backpropagation through it by defining the network output as the
probability heatmap of object location. Since the derivation is still carried
out in Fourier frequency domain, the efficiency property of DCF is preserved.
This enables our tracker to run at more than 60 FPS during test time, while
achieving a significant accuracy gain compared with KCF using HoGs. Extensive
evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the
proposed DCFNet tracker is competitive with several state-of-the-art trackers,
while being more compact and much faster.Comment: 5 pages, 4 figure
CREST: Convolutional Residual Learning for Visual Tracking
Discriminative correlation filters (DCFs) have been shown to perform
superiorly in visual tracking. They only need a small set of training samples
from the initial frame to generate an appearance model. However, existing DCFs
learn the filters separately from feature extraction, and update these filters
using a moving average operation with an empirical weight. These DCF trackers
hardly benefit from the end-to-end training. In this paper, we propose the
CREST algorithm to reformulate DCFs as a one-layer convolutional neural
network. Our method integrates feature extraction, response map generation as
well as model update into the neural networks for an end-to-end training. To
reduce model degradation during online update, we apply residual learning to
take appearance changes into account. Extensive experiments on the benchmark
datasets demonstrate that our CREST tracker performs favorably against
state-of-the-art trackers.Comment: ICCV 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/iccv17/index.htm
- …