16,684 research outputs found
Feature Selection Convolutional Neural Networks for Visual Tracking
Most of the existing tracking methods based on CNN(convolutional neural
networks) are too slow for real-time application despite the excellent tracking
precision compared with the traditional ones. Moreover, neural networks are
memory intensive which will take up lots of hardware resources. In this paper,
a feature selection visual tracking algorithm combining CNN based
MDNet(Multi-Domain Network) and RoIAlign was developed. We find that there is a
lot of redundancy in feature maps from convolutional layers. So valid feature
maps are selected by mutual information and others are abandoned which can
reduce the complexity and computation of the network and do not affect the
precision. The major problem of MDNet also lies in the time efficiency.
Considering the computational complexity of MDNet is mainly caused by the large
amount of convolution operations and fine-tuning of the network during
tracking, a RoIAlign layer which could conduct the convolution over the whole
image instead of each RoI is added to accelerate the convolution and a new
strategy of fine-tuning the fully-connected layers is used to accelerate the
update. With RoIAlign employed, the computation speed has been increased and it
shows greater precision than RoIPool. Because RoIAlign can process float number
coordinates by bilinear interpolation. These strategies can accelerate the
processing, reduce the complexity with very low impact on precision and it can
run at around 10 fps(while the speed of MDNet is about 1 fps). The proposed
algorithm has been evaluated on a benchmark: OTB100, on which high precision
and speed have been obtained.Comment: arXiv admin note: substantial text overlap with arXiv:1807.0313
DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking
Deep neural networks, albeit their great success on feature learning in
various computer vision tasks, are usually considered as impractical for online
visual tracking because they require very long training time and a large number
of training samples. In this work, we present an efficient and very robust
tracking algorithm using a single Convolutional Neural Network (CNN) for
learning effective feature representations of the target object, in a purely
online manner. Our contributions are multifold: First, we introduce a novel
truncated structural loss function that maintains as many training samples as
possible and reduces the risk of tracking error accumulation. Second, we
enhance the ordinary Stochastic Gradient Descent approach in CNN training with
a robust sample selection mechanism. The sampling mechanism randomly generates
positive and negative samples from different temporal distributions, which are
generated by taking the temporal relations and label noise into account.
Finally, a lazy yet effective updating scheme is designed for CNN training.
Equipped with this novel updating algorithm, the CNN model is robust to some
long-existing difficulties in visual tracking such as occlusion or incorrect
detections, without loss of the effective adaption for significant appearance
changes. In the experiment, our CNN tracker outperforms all compared
state-of-the-art methods on two recently proposed benchmarks which in total
involve over 60 video sequences. The remarkable performance improvement over
the existing trackers illustrates the superiority of the feature
representations which are learnedComment: 12 page
Object-Adaptive LSTM Network for Real-time Visual Tracking with Adversarial Data Augmentation
In recent years, deep learning based visual tracking methods have obtained
great success owing to the powerful feature representation ability of
Convolutional Neural Networks (CNNs). Among these methods, classification-based
tracking methods exhibit excellent performance while their speeds are heavily
limited by the expensive computation for massive proposal feature extraction.
In contrast, matching-based tracking methods (such as Siamese networks) possess
remarkable speed superiority. However, the absence of online updating renders
these methods unadaptable to significant object appearance variations. In this
paper, we propose a novel real-time visual tracking method, which adopts an
object-adaptive LSTM network to effectively capture the video sequential
dependencies and adaptively learn the object appearance variations. For high
computational efficiency, we also present a fast proposal selection strategy,
which utilizes the matching-based tracking method to pre-estimate dense
proposals and selects high-quality ones to feed to the LSTM network for
classification. This strategy efficiently filters out some irrelevant proposals
and avoids the redundant computation for feature extraction, which enables our
method to operate faster than conventional classification-based tracking
methods. In addition, to handle the problems of sample inadequacy and class
imbalance during online tracking, we adopt a data augmentation technique based
on the Generative Adversarial Network (GAN) to facilitate the training of the
LSTM network. Extensive experiments on four visual tracking benchmarks
demonstrate the state-of-the-art performance of our method in terms of both
tracking accuracy and speed, which exhibits great potentials of recurrent
structures for visual tracking
Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking
In this paper, we develop a new approach of spatially supervised recurrent
convolutional neural networks for visual object tracking. Our recurrent
convolutional network exploits the history of locations as well as the
distinctive visual features learned by the deep neural networks. Inspired by
recent bounding box regression methods for object detection, we study the
regression capability of Long Short-Term Memory (LSTM) in the temporal domain,
and propose to concatenate high-level visual features produced by convolutional
networks with region information. In contrast to existing deep learning based
trackers that use binary classification for region candidates, we use
regression for direct prediction of the tracking locations both at the
convolutional layer and at the recurrent unit. Our extensive experimental
results and performance comparison with state-of-the-art tracking methods on
challenging benchmark video tracking datasets shows that our tracker is more
accurate and robust while maintaining low computational cost. For most test
video sequences, our method achieves the best tracking performance, often
outperforms the second best by a large margin.Comment: 10 pages, 9 figures, conferenc
Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking
Most thermal infrared (TIR) tracking methods are discriminative, treating the
tracking problem as a classification task. However, the objective of the
classifier (label prediction) is not coupled to the objective of the tracker
(location estimation). The classification task focuses on the between-class
difference of the arbitrary objects, while the tracking task mainly deals with
the within-class difference of the same objects. In this paper, we cast the TIR
tracking problem as a similarity verification task, which is coupled well to
the objective of the tracking task. We propose a TIR tracker via a Hierarchical
Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To
obtain both spatial and semantic features of the TIR object, we design a
Siamese CNN that coalesces the multiple hierarchical convolutional layers.
Then, we propose a spatial-aware network to enhance the discriminative ability
of the coalesced hierarchical feature. Subsequently, we train this network end
to end on a large visible video detection dataset to learn the similarity
between paired objects before we transfer the network into the TIR domain.
Next, this pre-trained Siamese network is used to evaluate the similarity
between the target template and target candidates. Finally, we locate the
candidate that is most similar to the tracked target. Extensive experimental
results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed
method achieves favourable performance compared to the state-of-the-art
methods.Comment: 20 pages, 7 figure
Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition
This paper addresses the problem of online tracking and classification of
multiple objects in an image sequence. Our proposed solution is to first track
all objects in the scene without relying on object-specific prior knowledge,
which in other systems can take the form of hand-crafted features or user-based
track initialization. We then classify the tracked objects with a fast-learning
image classifier that is based on a shallow convolutional neural network
architecture and demonstrate that object recognition improves when this is
combined with object state information from the tracking algorithm. We argue
that by transferring the use of prior knowledge from the detection and tracking
stages to the classification stage we can design a robust, general purpose
object recognition system with the ability to detect and track a variety of
object types. We describe our biologically inspired implementation, which
adaptively learns the shape and motion of tracked objects, and apply it to the
Neovision2 Tower benchmark data set, which contains multiple object types. An
experimental evaluation demonstrates that our approach is competitive with
state-of-the-art video object recognition systems that do make use of
object-specific prior knowledge in detection and tracking, while providing
additional practical advantages by virtue of its generality.Comment: 15 page
Deep Learning of Appearance Models for Online Object Tracking
This paper introduces a novel deep learning based approach for vision based
single target tracking. We address this problem by proposing a network
architecture which takes the input video frames and directly computes the
tracking score for any candidate target location by estimating the probability
distributions of the positive and negative examples. This is achieved by
combining a deep convolutional neural network with a Bayesian loss layer in a
unified framework. In order to deal with the limited number of positive
training examples, the network is pre-trained offline for a generic image
feature representation and then is fine-tuned in multiple steps. An online
fine-tuning step is carried out at every frame to learn the appearance of the
target. We adopt a two-stage iterative algorithm to adaptively update the
network parameters and maintain a probability density for target/non-target
regions. The tracker has been tested on the standard tracking benchmark and the
results indicate that the proposed solution achieves state-of-the-art tracking
results
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
This survey presents a deep analysis of the learning and inference
capabilities in nine popular trackers. It is neither intended to study the
whole literature nor is it an attempt to review all kinds of neural networks
proposed for visual tracking. We focus instead on Siamese neural networks which
are a promising starting point for studying the challenging problem of
tracking. These networks integrate efficiently feature learning and the
temporal matching and have so far shown state-of-the-art performance. In
particular, the branches of Siamese networks, their layers connecting these
branches, specific aspects of training and the embedding of these networks into
the tracker are highlighted. Quantitative results from existing papers are
compared with the conclusion that the current evaluation methodology shows
problems with the reproducibility and the comparability of results. The paper
proposes a novel Lisp-like formalism for a better comparison of trackers. This
assumes a certain functional design and functional decomposition of trackers.
The paper tries to give foundation for tracker design by a formulation of the
problem based on the theory of machine learning and by the interpretation of a
tracker as a decision function. The work concludes with promising lines of
research and suggests future work.Comment: submitted to IEEE TPAM
Rotation Adaptive Visual Object Tracking with Motion Consistency
Visual Object tracking research has undergone significant improvement in the
past few years. The emergence of tracking by detection approach in tracking
paradigm has been quite successful in many ways. Recently, deep convolutional
neural networks have been extensively used in most successful trackers. Yet,
the standard approach has been based on correlation or feature selection with
minimal consideration given to motion consistency. Thus, there is still a need
to capture various physical constraints through motion consistency which will
improve accuracy, robustness and more importantly rotation adaptiveness.
Therefore, one of the major aspects of this paper is to investigate the outcome
of rotation adaptiveness in visual object tracking. Among other key
contributions, the paper also includes various consistencies that turn out to
be extremely effective in numerous challenging sequences than the current
state-of-the-art.Comment: Accepted conference paper WACV 201
- …