1,332 research outputs found
Quadruplet Network with One-Shot Learning for Fast Visual Object Tracking
In the same vein of discriminative one-shot learning, Siamese networks allow
recognizing an object from a single exemplar with the same class label.
However, they do not take advantage of the underlying structure of the data and
the relationship among the multitude of samples as they only rely on pairs of
instances for training. In this paper, we propose a new quadruplet deep network
to examine the potential connections among the training instances, aiming to
achieve a more powerful representation. We design four shared networks that
receive multi-tuple of instances as inputs and are connected by a novel loss
function consisting of pair-loss and triplet-loss. According to the similarity
metric, we select the most similar and the most dissimilar instances as the
positive and negative inputs of triplet loss from each multi-tuple. We show
that this scheme improves the training performance. Furthermore, we introduce a
new weight layer to automatically select suitable combination weights, which
will avoid the conflict between triplet and pair loss leading to worse
performance. We evaluate our quadruplet framework by model-free
tracking-by-detection of objects from a single initial exemplar in several
Visual Object Tracking benchmarks. Our extensive experimental analysis
demonstrates that our tracker achieves superior performance with a real-time
processing speed of 78 frames-per-second (fps)
DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking
Convolutional Siamese neural networks have been recently used to track
objects using deep features. Siamese architecture can achieve real time speed,
however it is still difficult to find a Siamese architecture that maintains the
generalization capability, high accuracy and speed while decreasing the number
of shared parameters especially when it is very deep. Furthermore, a
conventional Siamese architecture usually processes one local neighborhood at a
time, which makes the appearance model local and non-robust to appearance
changes.
To overcome these two problems, this paper proposes DensSiam, a novel
convolutional Siamese architecture, which uses the concept of dense layers and
connects each dense layer to all layers in a feed-forward fashion with a
similarity-learning function. DensSiam also includes a Self-Attention mechanism
to force the network to pay more attention to the non-local features during
offline training. Extensive experiments are performed on four tracking
benchmarks: OTB2013 and OTB2015 for validation set; and VOT2015, VOT2016 and
VOT2017 for testing set. The obtained results show that DensSiam achieves
superior results on these benchmarks compared to other current state-of-the-art
methods.Comment: 11 pages, 3 figures, Accepted by ISVC1
SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
Siamese network based trackers formulate tracking as convolutional feature
cross-correlation between target template and searching region. However,
Siamese trackers still have accuracy gap compared with state-of-the-art
algorithms and they cannot take advantage of feature from deep networks, such
as ResNet-50 or deeper. In this work we prove the core reason comes from the
lack of strict translation invariance. By comprehensive theoretical analysis
and experimental validations, we break this restriction through a simple yet
effective spatial aware sampling strategy and successfully train a
ResNet-driven Siamese tracker with significant performance gain. Moreover, we
propose a new model architecture to perform depth-wise and layer-wise
aggregations, which not only further improves the accuracy but also reduces the
model size. We conduct extensive ablation studies to demonstrate the
effectiveness of the proposed tracker, which obtains currently the best results
on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and
LaSOT. Our model will be released to facilitate further studies based on this
problem.Comment: 9 page
A Twofold Siamese Network for Real-Time Object Tracking
Observing that Semantic features learned in an image classification task and
Appearance features learned in a similarity matching task complement each
other, we build a twofold Siamese network, named SA-Siam, for real-time object
tracking. SA-Siam is composed of a semantic branch and an appearance branch.
Each branch is a similarity-learning Siamese network. An important design
choice in SA-Siam is to separately train the two branches to keep the
heterogeneity of the two types of features. In addition, we propose a channel
attention mechanism for the semantic branch. Channel-wise weights are computed
according to the channel activations around the target position. While the
inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate
beyond real-time, the twofold design and the attention mechanism significantly
improve the tracking performance. The proposed SA-Siam outperforms all other
real-time trackers by a large margin on OTB-2013/50/100 benchmarks.Comment: Accepted by CVPR'1
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
This survey presents a deep analysis of the learning and inference
capabilities in nine popular trackers. It is neither intended to study the
whole literature nor is it an attempt to review all kinds of neural networks
proposed for visual tracking. We focus instead on Siamese neural networks which
are a promising starting point for studying the challenging problem of
tracking. These networks integrate efficiently feature learning and the
temporal matching and have so far shown state-of-the-art performance. In
particular, the branches of Siamese networks, their layers connecting these
branches, specific aspects of training and the embedding of these networks into
the tracker are highlighted. Quantitative results from existing papers are
compared with the conclusion that the current evaluation methodology shows
problems with the reproducibility and the comparability of results. The paper
proposes a novel Lisp-like formalism for a better comparison of trackers. This
assumes a certain functional design and functional decomposition of trackers.
The paper tries to give foundation for tracker design by a formulation of the
problem based on the theory of machine learning and by the interpretation of a
tracker as a decision function. The work concludes with promising lines of
research and suggests future work.Comment: submitted to IEEE TPAM
Rotation Adaptive Visual Object Tracking with Motion Consistency
Visual Object tracking research has undergone significant improvement in the
past few years. The emergence of tracking by detection approach in tracking
paradigm has been quite successful in many ways. Recently, deep convolutional
neural networks have been extensively used in most successful trackers. Yet,
the standard approach has been based on correlation or feature selection with
minimal consideration given to motion consistency. Thus, there is still a need
to capture various physical constraints through motion consistency which will
improve accuracy, robustness and more importantly rotation adaptiveness.
Therefore, one of the major aspects of this paper is to investigate the outcome
of rotation adaptiveness in visual object tracking. Among other key
contributions, the paper also includes various consistencies that turn out to
be extremely effective in numerous challenging sequences than the current
state-of-the-art.Comment: Accepted conference paper WACV 201
Prediction-Tracking-Segmentation
We introduce a prediction driven method for visual tracking and segmentation
in videos. Instead of solely relying on matching with appearance cues for
tracking, we build a predictive model which guides finding more accurate
tracking regions efficiently. With the proposed prediction mechanism, we
improve the model robustness against distractions and occlusions during
tracking. We demonstrate significant improvements over state-of-the-art methods
not only on visual tracking tasks (VOT 2016 and VOT 2018) but also on video
segmentation datasets (DAVIS 2016 and DAVIS 2017)
Learning regression and verification networks for long-term visual tracking
Compared with short-term tracking, the long-term tracking task requires
determining the tracked object is present or absent, and then estimating the
accurate bounding box if present or conducting image-wide re-detection if
absent. Until now, few attempts have been done although this task is much
closer to designing practical tracking systems. In this work, we propose a
novel long-term tracking framework based on deep regression and verification
networks. The offline-trained regression model is designed using the
object-aware feature fusion and region proposal networks to generate a series
of candidates and estimate their similarity scores effectively. The
verification network evaluates these candidates to output the optimal one as
the tracked object with its classification score, which is online updated to
adapt to the appearance variations based on newly reliable observations. The
similarity and classification scores are combined to obtain a final confidence
value, based on which our tracker can determine the absence of the target
accurately and conduct image-wide re-detection to capture the target
successfully when it reappears. Extensive experiments show that our tracker
achieves the best performance on the VOT2018 long-term challenge and
state-of-the-art results on the OxUvA long-term dataset.Comment: 9 page
Unsupervised Deep Tracking
We propose an unsupervised visual tracking method in this paper. Different
from existing approaches using extensive annotated data for supervised
learning, our CNN model is trained on large-scale unlabeled videos in an
unsupervised manner. Our motivation is that a robust tracker should be
effective in both the forward and backward predictions (i.e., the tracker can
forward localize the target object in successive frames and backtrace to its
initial position in the first frame). We build our framework on a Siamese
correlation filter network, which is trained using unlabeled raw videos.
Meanwhile, we propose a multiple-frame validation method and a cost-sensitive
loss to facilitate unsupervised learning. Without bells and whistles, the
proposed unsupervised tracker achieves the baseline accuracy of fully
supervised trackers, which require complete and accurate labels during
training. Furthermore, unsupervised framework exhibits a potential in
leveraging unlabeled or weakly labeled data to further improve the tracking
accuracy.Comment: to appear in CVPR 201
FANTrack: 3D Multi-Object Tracking with Feature Association Network
We propose a data-driven approach to online multi-object tracking (MOT) that
uses a convolutional neural network (CNN) for data association in a
tracking-by-detection framework. The problem of multi-target tracking aims to
assign noisy detections to a-priori unknown and time-varying number of tracked
objects across a sequence of frames. A majority of the existing solutions focus
on either tediously designing cost functions or formulating the task of data
association as a complex optimization problem that can be solved effectively.
Instead, we exploit the power of deep learning to formulate the data
association problem as inference in a CNN. To this end, we propose to learn a
similarity function that combines cues from both image and spatial features of
objects. Our solution learns to perform global assignments in 3D purely from
data, handles noisy detections and a varying number of targets, and is easy to
train. We evaluate our approach on the challenging KITTI dataset and show
competitive results. Our code is available at
https://git.uwaterloo.ca/wise-lab/fantrack.Comment: 8 pages, 10 figures, IEEE Intelligent Vehicles Symposium (IV 19
- …