12,052 research outputs found
Good Features to Correlate for Visual Tracking
During the recent years, correlation filters have shown dominant and
spectacular results for visual object tracking. The types of the features that
are employed in these family of trackers significantly affect the performance
of visual tracking. The ultimate goal is to utilize robust features invariant
to any kind of appearance change of the object, while predicting the object
location as properly as in the case of no appearance change. As the deep
learning based methods have emerged, the study of learning features for
specific tasks has accelerated. For instance, discriminative visual tracking
methods based on deep architectures have been studied with promising
performance. Nevertheless, correlation filter based (CFB) trackers confine
themselves to use the pre-trained networks which are trained for object
classification problem. To this end, in this manuscript the problem of learning
deep fully convolutional features for the CFB visual tracking is formulated. In
order to learn the proposed model, a novel and efficient backpropagation
algorithm is presented based on the loss function of the network. The proposed
learning framework enables the network model to be flexible for a custom
design. Moreover, it alleviates the dependency on the network trained for
classification. Extensive performance analysis shows the efficacy of the
proposed custom design in the CFB tracking framework. By fine-tuning the
convolutional parts of a state-of-the-art network and integrating this model to
a CFB tracker, which is the top performing one of VOT2016, 18% increase is
achieved in terms of expected average overlap, and tracking failures are
decreased by 25%, while maintaining the superiority over the state-of-the-art
methods in OTB-2013 and OTB-2015 tracking datasets.Comment: Accepted version of IEEE Transactions on Image Processin
Deep Convolutional Correlation Particle Filter for Visual Tracking
In this dissertation, we explore the advantages and limitations of the application of sequential Monte Carlo methods to visual tracking, which is a challenging computer vision problem. We propose six visual tracking models, each of which integrates a particle filter, a deep convolutional neural network, and a correlation filter. In our first model, we generate an image patch corresponding to each particle and use a convolutional neural network (CNN) to extract features from the corresponding image region. A correlation filter then computes the correlation response maps corresponding to these features, which are used to determine the particle weights and estimate the state of the target. We then introduce a particle filter that extends the target state by incorporating its size information. This model also utilizes a new adaptive correlation filtering approach that generates multiple target models to account for potential model update errors. We build upon that strategy to devise an adaptive particle filter that can decrease the number of particles in simple frames in which there is no challenging scenarios and the target model closely reflects the current appearance of the target. This strategy allows us to reduce the computational cost of the particle filter without negatively impacting its performance. This tracker also improves the likelihood model by generating multiple target models using varying model update rates based on the high-likelihood particles. We also propose a novel likelihood particle filter for CNN-correlation visual trackers. Our method uses correlation response maps to estimate likelihood distributions and employs these likelihoods as proposal densities to sample particles. Additionally, our particle filter searches for multiple modes in the likelihood distribution using a Gaussian mixture model. We further introduce an iterative particle filter that performs iterations to decrease the distance between particles and the peaks of their correlation maps which results in having a few more accurate particles in the end of iterations. Applying K-mean clustering method on the remaining particles determine the number of the clusters which is used in evaluation step and find the target state. Our approach ensures a consistent support for the posterior distribution. Thus, we do not need to perform resampling at every video frame, improving the utilization of prior distribution information. Finally, we introduce a novel framework which calculates the confidence score of the tracking algorithm at each video frame based on the correlation response maps of the particles. Our framework applies different model update rules according to the calculated confidence score, reducing tracking failures caused by model drift. The benefits of each of the proposed techniques are demonstrated through experiments using publicly available benchmark datasets
Feature Distilled Tracking
Feature extraction and representation is one of the most important components for fast, accurate, and robust visual tracking. Very deep convolutional neural networks (CNNs) provide effective tools for feature extraction with good generalization ability. However, extracting features using very deep CNN models needs high performance hardware due to its large computation complexity, which prohibits its extensions in real-time applications. To alleviate this problem, we aim at obtaining small and fast-to-execute shallow models based on model compression for visual tracking. Specifically, we propose a small feature distilled network (FDN) for tracking by imitating the intermediate representations of a much deeper network. The FDN extracts rich visual features with higher speed than the original deeper network. To further speed-up, we introduce a shift-and-stitch method to reduce the arithmetic operations, while preserving the spatial resolution of the distilled feature maps unchanged. Finally, a scale adaptive discriminative correlation filter is learned on the distilled feature for visual tracking to handle scale variation of the target. Comprehensive experimental results on object tracking benchmark datasets show that the proposed approach achieves 5x speed-up with competitive performance to the state-of-the-art deep trackers
Deep Motion Features for Visual Tracking
Robust visual tracking is a challenging computer vision problem, with many
real-world applications. Most existing approaches employ hand-crafted
appearance features, such as HOG or Color Names. Recently, deep RGB features
extracted from convolutional neural networks have been successfully applied for
tracking. Despite their success, these features only capture appearance
information. On the other hand, motion cues provide discriminative and
complementary information that can improve tracking performance. Contrary to
visual tracking, deep motion features have been successfully applied for action
recognition and video classification tasks. Typically, the motion features are
learned by training a CNN on optical flow images extracted from large amounts
of labeled videos.
This paper presents an investigation of the impact of deep motion features in
a tracking-by-detection framework. We further show that hand-crafted, deep RGB,
and deep motion features contain complementary information. To the best of our
knowledge, we are the first to propose fusing appearance information with deep
motion features for visual tracking. Comprehensive experiments clearly suggest
that our fusion approach with deep motion features outperforms standard methods
relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision"
trac
- …