12,791 research outputs found
Visual Tracking via Dynamic Graph Learning
Existing visual tracking methods usually localize a target object with a
bounding box, in which the performance of the foreground object trackers or
detectors is often affected by the inclusion of background clutter. To handle
this problem, we learn a patch-based graph representation for visual tracking.
The tracked object is modeled by with a graph by taking a set of
non-overlapping image patches as nodes, in which the weight of each node
indicates how likely it belongs to the foreground and edges are weighted for
indicating the appearance compatibility of two neighboring nodes. This graph is
dynamically learned and applied in object tracking and model updating. During
the tracking process, the proposed algorithm performs three main steps in each
frame. First, the graph is initialized by assigning binary weights of some
image patches to indicate the object and background patches according to the
predicted bounding box. Second, the graph is optimized to refine the patch
weights by using a novel alternating direction method of multipliers. Third,
the object feature representation is updated by imposing the weights of patches
on the extracted image features. The object location is predicted by maximizing
the classification score in the structured support vector machine. Extensive
experiments show that the proposed tracking algorithm performs well against the
state-of-the-art methods on large-scale benchmark datasets.Comment: Submitted to TPAMI 201
Patch-based adaptive weighting with segmentation and scale (PAWSS) for visual tracking
Tracking-by-detection algorithms are widely used for visual tracking, where
the problem is treated as a classification task where an object model is
updated over time using online learning techniques. In challenging conditions
where an object undergoes deformation or scale variations, the update step is
prone to include background information in the model appearance or to lack the
ability to estimate the scale change, which degrades the performance of the
classifier. In this paper, we incorporate a Patch-based Adaptive Weighting with
Segmentation and Scale (PAWSS) tracking framework that tackles both the scale
and background problems. A simple but effective colour-based segmentation model
is used to suppress background information and multi-scale samples are
extracted to enrich the training pool, which allows the tracker to handle both
incremental and abrupt scale variations between frames. Experimentally, we
evaluate our approach on the online tracking benchmark (OTB) dataset and Visual
Object Tracking (VOT) challenge datasets. The results show that our approach
outperforms recent state-of-the-art trackers, and it especially improves the
successful rate score on the OTB dataset, while on the VOT datasets, PAWSS
ranks among the top trackers while operating at real-time frame rates.Comment: 10 pages, 8 figures. The paper is under consideration at Pattern
Recognition Letter
Part-based Visual Tracking via Structural Support Correlation Filter
Recently, part-based and support vector machines (SVM) based trackers have
shown favorable performance. Nonetheless, the time-consuming online training
and updating process limit their real-time applications. In order to better
deal with the partial occlusion issue and improve their efficiency, we propose
a novel part-based structural support correlation filter tracking method, which
absorbs the strong discriminative ability from SVM and the excellent property
of part-based tracking methods which is less sensitive to partial occlusion.
Then, our proposed model can learn the support correlation filter of each part
jointly by a star structure model, which preserves the spatial layout structure
among parts and tolerates outliers of parts. In addition, to mitigate the issue
of drift away from object further, we introduce inter-frame consistencies of
local parts into our model. Finally, in our model, we accurately estimate the
scale changes of object by the relative distance change among reliable parts.
The extensive empirical evaluations on three benchmark datasets: OTB2015,
TempleColor128 and VOT2015 demonstrate that the proposed method performs
superiorly against several state-of-the-art trackers in terms of tracking
accuracy, speed and robustness
Deformable Parts Correlation Filters for Robust Visual Tracking
Deformable parts models show a great potential in tracking by principally
addressing non-rigid object deformations and self occlusions, but according to
recent benchmarks, they often lag behind the holistic approaches. The reason is
that potentially large number of degrees of freedom have to be estimated for
object localization and simplifications of the constellation topology are often
assumed to make the inference tractable. We present a new formulation of the
constellation model with correlation filters that treats the geometric and
visual constraints within a single convex cost function and derive a highly
efficient optimization for MAP inference of a fully-connected constellation. We
propose a tracker that models the object at two levels of detail. The coarse
level corresponds a root correlation filter and a novel color model for
approximate object localization, while the mid-level representation is composed
of the new deformable constellation of correlation filters that refine the
object location. The resulting tracker is rigorously analyzed on a highly
challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art
performance and runs in real-time.Comment: 14 pages, first submission to jurnal: 9.11.2015, re-submission on
11.5.201
Spectral Filter Tracking
Visual object tracking is a challenging computer vision task with numerous
real-world applications. Here we propose a simple but efficient Spectral Filter
Tracking (SFT)method. To characterize rotational and translation invariance of
tracking targets, the candidate image region is models as a pixelwise grid
graph. Instead of the conventional graph matching, we convert the tracking into
a plain least square regression problem to estimate the best center coordinate
of the target. But different from the holistic regression of correlation filter
based methods, SFT can operate on localized surrounding regions of each pixel
(i.e.,vertex) by using spectral graph filters, which thus is more robust to
resist local variations and cluttered background.To bypass the eigenvalue
decomposition problem of the graph Laplacian matrix L, we parameterize spectral
graph filters as the polynomial of L by spectral graph theory, in which L k
exactly encodes a k-hop local neighborhood of each vertex. Finally, the filter
parameters (i.e., polynomial coefficients) as well as feature projecting
functions are jointly integrated into the regression model.Comment: 11page
Once for All: a Two-flow Convolutional Neural Network for Visual Tracking
One of the main challenges of visual object tracking comes from the arbitrary
appearance of objects. Most existing algorithms try to resolve this problem as
an object-specific task, i.e., the model is trained to regenerate or classify a
specific object. As a result, the model need to be initialized and retrained
for different objects. In this paper, we propose a more generic approach
utilizing a novel two-flow convolutional neural network (named YCNN). The YCNN
takes two inputs (one is object image patch, the other is search image patch),
then outputs a response map which predicts how likely the object appears in a
specific location. Unlike those object-specific approach, the YCNN is trained
to measure the similarity between two image patches. Thus it will not be
confined to any specific object. Furthermore the network can be end-to-end
trained to extract both shallow and deep convolutional features which are
dedicated for visual tracking. And once properly trained, the YCNN can be
applied to track all kinds of objects without further training and updating.
Benefiting from the once-for-all model, our algorithm is able to run at a very
high speed of 45 frames-per-second. The experiments on 51 sequences also show
that our algorithm achieves an outstanding performance
CREST: Convolutional Residual Learning for Visual Tracking
Discriminative correlation filters (DCFs) have been shown to perform
superiorly in visual tracking. They only need a small set of training samples
from the initial frame to generate an appearance model. However, existing DCFs
learn the filters separately from feature extraction, and update these filters
using a moving average operation with an empirical weight. These DCF trackers
hardly benefit from the end-to-end training. In this paper, we propose the
CREST algorithm to reformulate DCFs as a one-layer convolutional neural
network. Our method integrates feature extraction, response map generation as
well as model update into the neural networks for an end-to-end training. To
reduce model degradation during online update, we apply residual learning to
take appearance changes into account. Extensive experiments on the benchmark
datasets demonstrate that our CREST tracker performs favorably against
state-of-the-art trackers.Comment: ICCV 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/iccv17/index.htm
Particle Filter Re-detection for Visual Tracking via Correlation Filters
Most of the correlation filter based tracking algorithms can achieve good
performance and maintain fast computational speed. However, in some complicated
tracking scenes, there is a fatal defect that causes the object to be located
inaccurately. In order to address this problem, we propose a particle filter
redetection based tracking approach for accurate object localization. During
the tracking process, the kernelized correlation filter (KCF) based tracker
locates the object by relying on the maximum response value of the response
map; when the response map becomes ambiguous, the KCF tracking result becomes
unreliable. Our method can provide more candidates by particle resampling to
detect the object accordingly. Additionally, we give a new object scale
evaluation mechanism, which merely considers the differences between the
maximum response values in consecutive frames. Extensive experiments on OTB2013
and OTB2015 datasets demonstrate that the proposed tracker performs favorably
in relation to the state-of-the-art methods.Comment: 18 pages, 6 figures, 2 table
On the Relations of Correlation Filter Based Trackers and Struck
In recent years, two types of trackers, namely correlation filter based
tracker (CF tracker) and structured output tracker (Struck), have exhibited the
state-of-the-art performance. However, there seems to be lack of analytic work
on their relations in the computer vision community. In this paper, we
investigate two state-of-the-art CF trackers, i.e., spatial regularization
discriminative correlation filter (SRDCF) and correlation filter with limited
boundaries (CFLB), and Struck, and reveal their relations. Specifically, after
extending the CFLB to its multiple channel version we prove the relation
between SRDCF and CFLB on the condition that the spatial regularization factor
of SRDCF is replaced by the masking matrix of CFLB. We also prove the
asymptotical approximate relation between SRDCF and Struck on the conditions
that the spatial regularization factor of SRDCF is replaced by an indicator
function of object bounding box, the weights of SRDCF in its loss item are
replaced by those of Struck, the linear kernel is employed by Struck, and the
search region tends to infinity. Extensive experiments on public benchmarks
OTB50 and OTB100 are conducted to verify our theoretical results. Moreover, we
explain how detailed differences among SRDCF, CFLB, and Struck would give rise
to slightly different performances on visual sequence
Robust Visual Tracking via Hierarchical Convolutional Features
In this paper, we propose to exploit the rich hierarchical features of deep
convolutional neural networks to improve the accuracy and robustness of visual
tracking. Deep neural networks trained on object recognition datasets consist
of multiple convolutional layers. These layers encode target appearance with
different levels of abstraction. For example, the outputs of the last
convolutional layers encode the semantic information of targets and such
representations are invariant to significant appearance variations. However,
their spatial resolutions are too coarse to precisely localize the target. In
contrast, features from earlier convolutional layers provide more precise
localization but are less invariant to appearance changes. We interpret the
hierarchical features of convolutional layers as a nonlinear counterpart of an
image pyramid representation and explicitly exploit these multiple levels of
abstraction to represent target objects. Specifically, we learn adaptive
correlation filters on the outputs from each convolutional layer to encode the
target appearance. We infer the maximum response of each layer to locate
targets in a coarse-to-fine manner. To further handle the issues with scale
estimation and re-detecting target objects from tracking failures caused by
heavy occlusion or out-of-the-view movement, we conservatively learn another
correlation filter, that maintains a long-term memory of target appearance, as
a discriminative classifier. We apply the classifier to two types of object
proposals: (1) proposals with a small step size and tightly around the
estimated location for scale estimation; and (2) proposals with large step size
and across the whole image for target re-detection. Extensive experimental
results on large-scale benchmark datasets show that the proposed algorithm
performs favorably against state-of-the-art tracking methods.Comment: To appear in T-PAMI 2018, project page at
https://sites.google.com/site/chaoma99/hcft-trackin
- …