3,610 research outputs found
Online Metric-Weighted Linear Representations for Robust Visual Tracking
In this paper, we propose a visual tracker based on a metric-weighted linear
representation of appearance. In order to capture the interdependence of
different feature dimensions, we develop two online distance metric learning
methods using proximity comparison information and structured output learning.
The learned metric is then incorporated into a linear representation of
appearance.
We show that online distance metric learning significantly improves the
robustness of the tracker, especially on those sequences exhibiting drastic
appearance changes. In order to bound growth in the number of training samples,
we design a time-weighted reservoir sampling method.
Moreover, we enable our tracker to automatically perform object
identification during the process of object tracking, by introducing a
collection of static template samples belonging to several object classes of
interest. Object identification results for an entire video sequence are
achieved by systematically combining the tracking information and visual
recognition at each frame. Experimental results on challenging video sequences
demonstrate the effectiveness of the method for both inter-frame tracking and
object identification.Comment: 51 pages. Appearing in IEEE Transactions on Pattern Analysis and
Machine Intelligenc
Online Feature Selection for Visual Tracking
Object tracking is one of the most important tasks in many applications of computer vision. Many tracking methods use a fixed set of features ignoring that appearance of a target object may change drastically due to intrinsic and extrinsic factors. The ability to dynamically identify discriminative features would help in handling the appearance variability by improving tracking performance. The contribution of this work is threefold. Firstly, this paper presents a collection of several modern feature selection approaches selected among filter, embedded, and wrapper methods. Secondly, we provide extensive tests regarding the classification task intended to explore the strengths and weaknesses of the proposed methods with the goal to identify the right candidates for online tracking. Finally, we show how feature selection mechanisms can be successfully employed for ranking the features used by a tracking system, maintaining high frame rates. In particular, feature selection mounted on the Adaptive Color Tracking (ACT) system operates at over 110 FPS. This work demonstrates the importance of feature selection in online and realtime applications, resulted in what is clearly a very impressive performance, our solutions improve by 3% up to 7% the baseline ACT while providing superior results compared to 29 state-of-the-art tracking methods
Siamese Instance Search for Tracking
In this paper we present a tracker, which is radically different from
state-of-the-art trackers: we apply no model updating, no occlusion detection,
no combination of trackers, no geometric matching, and still deliver
state-of-the-art tracking performance, as demonstrated on the popular online
tracking benchmark (OTB) and six very challenging YouTube videos. The presented
tracker simply matches the initial patch of the target in the first frame with
candidates in a new frame and returns the most similar patch by a learned
matching function. The strength of the matching function comes from being
extensively trained generically, i.e., without any data of the target, using a
Siamese deep neural network, which we design for tracking. Once learned, the
matching function is used as is, without any adapting, to track previously
unseen targets. It turns out that the learned matching function is so powerful
that a simple tracker built upon it, coined Siamese INstance search Tracker,
SINT, which only uses the original observation of the target from the first
frame, suffices to reach state-of-the-art performance. Further, we show the
proposed tracker even allows for target re-identification after the target was
absent for a complete video shot.Comment: This paper is accepted to the IEEE Conference on Computer Vision and
Pattern Recognition, 201
Non-sparse Linear Representations for Visual Tracking with Online Reservoir Metric Learning
Most sparse linear representation-based trackers need to solve a
computationally expensive L1-regularized optimization problem. To address this
problem, we propose a visual tracker based on non-sparse linear
representations, which admit an efficient closed-form solution without
sacrificing accuracy. Moreover, in order to capture the correlation information
between different feature dimensions, we learn a Mahalanobis distance metric in
an online fashion and incorporate the learned metric into the optimization
problem for obtaining the linear representation. We show that online metric
learning using proximity comparison significantly improves the robustness of
the tracking, especially on those sequences exhibiting drastic appearance
changes. Furthermore, in order to prevent the unbounded growth in the number of
training samples for the metric learning, we design a time-weighted reservoir
sampling method to maintain and update limited-sized foreground and background
sample buffers for balancing sample diversity and adaptability. Experimental
results on challenging videos demonstrate the effectiveness and robustness of
the proposed tracker.Comment: Appearing in IEEE Conf. Computer Vision and Pattern Recognition, 201
End-to-end representation learning for Correlation Filter based tracking
The Correlation Filter is an algorithm that trains a linear template to
discriminate between images and their translations. It is well suited to object
tracking because its formulation in the Fourier domain provides a fast
solution, enabling the detector to be re-trained once per frame. Previous works
that use the Correlation Filter, however, have adopted features that were
either manually designed or trained for a different task. This work is the
first to overcome this limitation by interpreting the Correlation Filter
learner, which has a closed-form solution, as a differentiable layer in a deep
neural network. This enables learning deep features that are tightly coupled to
the Correlation Filter. Experiments illustrate that our method has the
important practical benefit of allowing lightweight architectures to achieve
state-of-the-art performance at high framerates.Comment: To appear at CVPR 201
An Accelerated Correlation Filter Tracker
Recent visual object tracking methods have witnessed a continuous improvement
in the state-of-the-art with the development of efficient discriminative
correlation filters (DCF) and robust deep neural network features. Despite the
outstanding performance achieved by the above combination, existing advanced
trackers suffer from the burden of high computational complexity of the deep
feature extraction and online model learning. We propose an accelerated ADMM
optimisation method obtained by adding a momentum to the optimisation sequence
iterates, and by relaxing the impact of the error between DCF parameters and
their norm. The proposed optimisation method is applied to an innovative
formulation of the DCF design, which seeks the most discriminative spatially
regularised feature channels. A further speed up is achieved by an adaptive
initialisation of the filter optimisation process. The significantly increased
convergence of the DCF filter is demonstrated by establishing the optimisation
process equivalence with a continuous dynamical system for which the
convergence properties can readily be derived. The experimental results
obtained on several well-known benchmarking datasets demonstrate the efficiency
and robustness of the proposed ACFT method, with a tracking accuracy comparable
to the start-of-the-art trackers
Switching Local and Covariance Matching for Efficient Object Tracking
The covariance tracker finds the targets in consecutive frames by global searching. Covariance tracking has achieved impressive successes thanks to its ability of capturing spatial and statistical properties as well as the correlations between them. Nevertheless, the covariance tracker is relatively inefficient due to its heavy computational cost of model updating and comparing the model with the covariance matrices of the candidate regions. Moreover, it is not good at dealing with articulated object tracking since integral histograms are employed to accelerate the searching process. In this work, we aim to alleviate the computational burden by selecting appropriate tracking approaches. We compute foreground probabilities of pixels and localize the target by local searching when the tracking is in steady states. Covariance tracking is performed when distractions, sudden motions or occlusions are detected. Different from the traditional covariance tracker, we use Log-Euclidean metrics instead of Riemannian invariant metrics which are more computationally expensive. The proposed tracking algorithm has been verified on many video sequences. It proves more efficient than the covariance tracker. It is also effective in dealing with occlusions, which are an obstacle for local mode-seeking trackers such as the mean-shift tracker. 1
- …