6,651 research outputs found
DCFNet: Discriminant Correlation Filters Network for Visual Tracking
Discriminant Correlation Filters (DCF) based methods now become a kind of
dominant approach to online object tracking. The features used in these
methods, however, are either based on hand-crafted features like HoGs, or
convolutional features trained independently from other tasks like image
classification. In this work, we present an end-to-end lightweight network
architecture, namely DCFNet, to learn the convolutional features and perform
the correlation tracking process simultaneously. Specifically, we treat DCF as
a special correlation filter layer added in a Siamese network, and carefully
derive the backpropagation through it by defining the network output as the
probability heatmap of object location. Since the derivation is still carried
out in Fourier frequency domain, the efficiency property of DCF is preserved.
This enables our tracker to run at more than 60 FPS during test time, while
achieving a significant accuracy gain compared with KCF using HoGs. Extensive
evaluations on OTB-2013, OTB-2015, and VOT2015 benchmarks demonstrate that the
proposed DCFNet tracker is competitive with several state-of-the-art trackers,
while being more compact and much faster.Comment: 5 pages, 4 figure
Learning a Robust Society of Tracking Parts using Co-occurrence Constraints
Object tracking is an essential problem in computer vision that has been
researched for several decades. One of the main challenges in tracking is to
adapt to object appearance changes over time and avoiding drifting to
background clutter. We address this challenge by proposing a deep neural
network composed of different parts, which functions as a society of tracking
parts. They work in conjunction according to a certain policy and learn from
each other in a robust manner, using co-occurrence constraints that ensure
robust inference and learning. From a structural point of view, our network is
composed of two main pathways. One pathway is more conservative. It carefully
monitors a large set of simple tracker parts learned as linear filters over
deep feature activation maps. It assigns the parts different roles. It promotes
the reliable ones and removes the inconsistent ones. We learn these filters
simultaneously in an efficient way, with a single closed-form formulation, for
which we propose novel theoretical properties. The second pathway is more
progressive. It is learned completely online and thus it is able to better
model object appearance changes. In order to adapt in a robust manner, it is
learned only on highly confident frames, which are decided using co-occurrences
with the first pathway. Thus, our system has the full benefit of two main
approaches in tracking. The larger set of simpler filter parts offers
robustness, while the full deep network learned online provides adaptability to
change. As shown in the experimental section, our approach achieves state of
the art performance on the challenging VOT17 benchmark, outperforming the
published methods both on the general EAO metric and in the number of fails, by
a significant margin.Comment: 17+3 pages, 5 figures, European Conference on Computer Vision (ECCV),
Visual Object Tracking worksho
Learning Spatial-Aware Regressions for Visual Tracking
In this paper, we analyze the spatial information of deep features, and
propose two complementary regressions for robust visual tracking. First, we
propose a kernelized ridge regression model wherein the kernel value is defined
as the weighted sum of similarity scores of all pairs of patches between two
samples. We show that this model can be formulated as a neural network and thus
can be efficiently solved. Second, we propose a fully convolutional neural
network with spatially regularized kernels, through which the filter kernel
corresponding to each output channel is forced to focus on a specific region of
the target. Distance transform pooling is further exploited to determine the
effectiveness of each output channel of the convolution layer. The outputs from
the kernelized ridge regression model and the fully convolutional neural
network are combined to obtain the ultimate response. Experimental results on
two benchmark datasets validate the effectiveness of the proposed method.Comment: To appear in CVPR201
A Twofold Siamese Network for Real-Time Object Tracking
Observing that Semantic features learned in an image classification task and
Appearance features learned in a similarity matching task complement each
other, we build a twofold Siamese network, named SA-Siam, for real-time object
tracking. SA-Siam is composed of a semantic branch and an appearance branch.
Each branch is a similarity-learning Siamese network. An important design
choice in SA-Siam is to separately train the two branches to keep the
heterogeneity of the two types of features. In addition, we propose a channel
attention mechanism for the semantic branch. Channel-wise weights are computed
according to the channel activations around the target position. While the
inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate
beyond real-time, the twofold design and the attention mechanism significantly
improve the tracking performance. The proposed SA-Siam outperforms all other
real-time trackers by a large margin on OTB-2013/50/100 benchmarks.Comment: Accepted by CVPR'1
End-to-end representation learning for Correlation Filter based tracking
The Correlation Filter is an algorithm that trains a linear template to
discriminate between images and their translations. It is well suited to object
tracking because its formulation in the Fourier domain provides a fast
solution, enabling the detector to be re-trained once per frame. Previous works
that use the Correlation Filter, however, have adopted features that were
either manually designed or trained for a different task. This work is the
first to overcome this limitation by interpreting the Correlation Filter
learner, which has a closed-form solution, as a differentiable layer in a deep
neural network. This enables learning deep features that are tightly coupled to
the Correlation Filter. Experiments illustrate that our method has the
important practical benefit of allowing lightweight architectures to achieve
state-of-the-art performance at high framerates.Comment: To appear at CVPR 201
Long and Short Memory Balancing in Visual Co-Tracking using Q-Learning
Employing one or more additional classifiers to break the self-learning loop
in tracing-by-detection has gained considerable attention. Most of such
trackers merely utilize the redundancy to address the accumulating label error
in the tracking loop, and suffer from high computational complexity as well as
tracking challenges that may interrupt all classifiers (e.g. temporal
occlusions). We propose the active co-tracking framework, in which the main
classifier of the tracker labels samples of the video sequence, and only
consults auxiliary classifier when it is uncertain. Based on the source of the
uncertainty and the differences of two classifiers (e.g. accuracy, speed,
update frequency, etc.), different policies should be taken to exchange the
information between two classifiers. Here, we introduce a reinforcement
learning approach to find the appropriate policy by considering the state of
the tracker in a specific sequence. The proposed method yields promising
results in comparison to the best tracking-by-detection approaches.Comment: Submitted to ICIP 201
Self-Selective Correlation Ship Tracking Method for Smart Ocean System
In recent years, with the development of the marine industry, navigation
environment becomes more complicated. Some artificial intelligence
technologies, such as computer vision, can recognize, track and count the
sailing ships to ensure the maritime security and facilitates the management
for Smart Ocean System. Aiming at the scaling problem and boundary effect
problem of traditional correlation filtering methods, we propose a
self-selective correlation filtering method based on box regression (BRCF). The
proposed method mainly include: 1) A self-selective model with negative samples
mining method which effectively reduces the boundary effect in strengthening
the classification ability of classifier at the same time; 2) A bounding box
regression method combined with a key points matching method for the scale
prediction, leading to a fast and efficient calculation. The experimental
results show that the proposed method can effectively deal with the problem of
ship size changes and background interference. The success rates and precisions
were higher than Discriminative Scale Space Tracking (DSST) by over 8
percentage points on the marine traffic dataset of our laboratory. In terms of
processing speed, the proposed method is higher than DSST by nearly 22 Frames
Per Second (FPS)
Deep-LK for Efficient Adaptive Object Tracking
In this paper we present a new approach for efficient regression based object
tracking which we refer to as Deep- LK. Our approach is closely related to the
Generic Object Tracking Using Regression Networks (GOTURN) framework of Held et
al. We make the following contributions. First, we demonstrate that there is a
theoretical relationship between siamese regression networks like GOTURN and
the classical Inverse-Compositional Lucas & Kanade (IC-LK) algorithm. Further,
we demonstrate that unlike GOTURN IC-LK adapts its regressor to the appearance
of the currently tracked frame. We argue that this missing property in GOTURN
can be attributed to its poor performance on unseen objects and/or viewpoints.
Second, we propose a novel framework for object tracking - which we refer to as
Deep-LK - that is inspired by the IC-LK framework. Finally, we show impressive
results demonstrating that Deep-LK substantially outperforms GOTURN.
Additionally, we demonstrate comparable tracking performance to current state
of the art deep-trackers whilst being an order of magnitude (i.e. 100 FPS)
computationally efficient
Adversarial Feature Sampling Learning for Efficient Visual Tracking
The tracking-by-detection framework usually consist of two stages: drawing
samples around the target object in the first stage and classifying each sample
as the target object or background in the second stage. Current popular
trackers based on tracking-by-detection framework typically draw samples in the
raw image as the inputs of deep convolution networks in the first stage, which
usually results in high computational burden and low running speed. In this
paper, we propose a new visual tracking method using sampling deep
convolutional features to address this problem. Only one cropped image around
the target object is input into the designed deep convolution network and the
samples is sampled on the feature maps of the network by spatial bilinear
resampling. In addition, a generative adversarial network is integrated into
our network framework to augment positive samples and improve the tracking
performance. Extensive experiments on benchmark datasets demonstrate that the
proposed method achieves a comparable performance to state-of-the-art trackers
and accelerates tracking-by-detection trackers based on raw-image samples
effectively
Learning Cascaded Siamese Networks for High Performance Visual Tracking
Visual tracking is one of the most challenging computer vision problems. In
order to achieve high performance visual tracking in various negative
scenarios, a novel cascaded Siamese network is proposed and developed based on
two different deep learning networks: a matching subnetwork and a
classification subnetwork. The matching subnetwork is a fully convolutional
Siamese network. According to the similarity score between the exemplar image
and the candidate image, it aims to search possible object positions and crop
scaled candidate patches. The classification subnetwork is designed to further
evaluate the cropped candidate patches and determine the optimal tracking
results based on the classification score. The matching subnetwork is trained
offline and fixed online, while the classification subnetwork performs
stochastic gradient descent online to learn more target-specific information.
To improve the tracking performance further, an effective classification
subnetwork update method based on both similarity and classification scores is
utilized for updating the classification subnetwork. Extensive experimental
results demonstrate that our proposed approach achieves state-of-the-art
performance in recent benchmarks.Comment: Accepted for IEEE 26th International Conference on Image Processing
(ICIP 2019
- …