32,044 research outputs found
An Experimental Survey on Correlation Filter-based Tracking
Over these years, Correlation Filter-based Trackers (CFTs) have aroused
increasing interests in the field of visual object tracking, and have achieved
extremely compelling results in different competitions and benchmarks. In this
paper, our goal is to review the developments of CFTs with extensive
experimental results. 11 trackers are surveyed in our work, based on which a
general framework is summarized. Furthermore, we investigate different training
schemes for correlation filters, and also discuss various effective
improvements that have been made recently. Comprehensive experiments have been
conducted to evaluate the effectiveness and efficiency of the surveyed CFTs,
and comparisons have been made with other competing trackers. The experimental
results have shown that state-of-art performance, in terms of robustness, speed
and accuracy, can be achieved by several recent CFTs, such as MUSTer and SAMF.
We find that further improvements for correlation filter-based tracking can be
made on estimating scales, applying part-based tracking strategy and
cooperating with long-term tracking methods.Comment: 13 pages, 25 figure
Effective Occlusion Handling for Fast Correlation Filter-based Trackers
Correlation filter-based trackers heavily suffer from the problem of multiple
peaks in their response maps incurred by occlusions. Moreover, the whole
tracking pipeline may break down due to the uncertainties brought by shifting
among peaks, which will further lead to the degraded correlation filter model.
To alleviate the drift problem caused by occlusions, we propose a novel scheme
to choose the specific filter model according to different scenarios.
Specifically, an effective measurement function is designed to evaluate the
quality of filter response. A sophisticated strategy is employed to judge
whether occlusions occur, and then decide how to update the filter models. In
addition, we take advantage of both log-polar method and pyramid-like approach
to estimate the best scale of the target. We evaluate our proposed approach on
VOT2018 challenge and OTB100 dataset, whose experimental result shows that the
proposed tracker achieves the promising performance compared against the
state-of-the-art trackers
Part-based Visual Tracking via Structural Support Correlation Filter
Recently, part-based and support vector machines (SVM) based trackers have
shown favorable performance. Nonetheless, the time-consuming online training
and updating process limit their real-time applications. In order to better
deal with the partial occlusion issue and improve their efficiency, we propose
a novel part-based structural support correlation filter tracking method, which
absorbs the strong discriminative ability from SVM and the excellent property
of part-based tracking methods which is less sensitive to partial occlusion.
Then, our proposed model can learn the support correlation filter of each part
jointly by a star structure model, which preserves the spatial layout structure
among parts and tolerates outliers of parts. In addition, to mitigate the issue
of drift away from object further, we introduce inter-frame consistencies of
local parts into our model. Finally, in our model, we accurately estimate the
scale changes of object by the relative distance change among reliable parts.
The extensive empirical evaluations on three benchmark datasets: OTB2015,
TempleColor128 and VOT2015 demonstrate that the proposed method performs
superiorly against several state-of-the-art trackers in terms of tracking
accuracy, speed and robustness
Tracking for Half an Hour
Long-term tracking requires extreme stability to the multitude of model
updates and robustness to the disappearance and loss of the target as such will
inevitably happen. For motivation, we have taken 10 randomly selected
OTB-sequences, doubled each by attaching a reversed version and repeated each
double sequence 20 times. On most of these repetitive videos, the best current
tracker performs worse on each loop. This illustrates the difference between
optimization for short-term versus long-term tracking. In a long-term tracker a
combined global and local search strategy is beneficial, allowing for recovery
from failures and disappearance. Most importantly, the proposed tracker also
employs cautious updating, guided by self-quality assessment. The proposed
tracker is still among the best on the 20-sec OTB-videos while achieving
state-of-the-art on the 100-sec UAV20L benchmark. On 10 new half-an-hour videos
with city bicycling, sport games etc, the proposed tracker outperforms others
by a large margin where the 2010 TLD tracker comes second.Comment: tech repor
Efficient Discriminative Nonorthogonal Binary Subspace with its Application to Visual Tracking
One of the crucial problems in visual tracking is how the object is
represented. Conventional appearance-based trackers are using increasingly more
complex features in order to be robust. However, complex representations
typically not only require more computation for feature extraction, but also
make the state inference complicated. We show that with a careful feature
selection scheme, extremely simple yet discriminative features can be used for
robust object tracking. The central component of the proposed method is a
succinct and discriminative representation of the object using discriminative
non-orthogonal binary subspace (DNBS) which is spanned by Haar-like features.
The DNBS representation inherits the merits of the original NBS in that it
efficiently describes the object. It also incorporates the discriminative
information to distinguish foreground from background. However, the problem of
finding the DNBS bases from an over-complete dictionary is NP-hard. We propose
a greedy algorithm called discriminative optimized orthogonal matching pursuit
(D-OOMP) to solve this problem. An iterative formulation named iterative D-OOMP
is further developed to drastically reduce the redundant computation between
iterations and a hierarchical selection strategy is integrated for reducing the
search space of features. The proposed DNBS representation is applied to object
tracking through SSD-based template matching. We validate the effectiveness of
our method through extensive experiments on challenging videos with comparisons
against several state-of-the-art trackers and demonstrate its capability to
track objects in clutter and moving background.Comment: 15 page
Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks
We propose a novel video object segmentation algorithm based on pixel-level
matching using Convolutional Neural Networks (CNN). Our network aims to
distinguish the target area from the background on the basis of the pixel-level
similarity between two object units. The proposed network represents a target
object using features from different depth layers in order to take advantage of
both the spatial details and the category-level semantic information.
Furthermore, we propose a feature compression technique that drastically
reduces the memory requirements while maintaining the capability of feature
representation. Two-stage training (pre-training and fine-tuning) allows our
network to handle any target object regardless of its category (even if the
object's type does not belong to the pre-training data) or of variations in its
appearance through a video sequence. Experiments on large datasets demonstrate
the effectiveness of our model - against related methods - in terms of
accuracy, speed, and stability. Finally, we introduce the transferability of
our network to different domains, such as the infrared data domain.Comment: To appear on ICCV 201
Recurrent Filter Learning for Visual Tracking
Recently using convolutional neural networks (CNNs) has gained popularity in
visual tracking, due to its robust feature representation of images. Recent
methods perform online tracking by fine-tuning a pre-trained CNN model to the
specific target object using stochastic gradient descent (SGD)
back-propagation, which is usually time-consuming. In this paper, we propose a
recurrent filter generation methods for visual tracking. We directly feed the
target's image patch to a recurrent neural network (RNN) to estimate an
object-specific filter for tracking. As the video sequence is a spatiotemporal
data, we extend the matrix multiplications of the fully-connected layers of the
RNN to a convolution operation on feature maps, which preserves the target's
spatial structure and also is memory-efficient. The tracked object in the
subsequent frames will be fed into the RNN to adapt the generated filters to
appearance variations of the target. Note that once the off-line training
process of our network is finished, there is no need to fine-tune the network
for specific objects, which makes our approach more efficient than methods that
use iterative fine-tuning to online learn the target. Extensive experiments
conducted on widely used benchmarks, OTB and VOT, demonstrate encouraging
results compared to other recent methods.Comment: ICCV2017 Workshop on VO
Patch-based adaptive weighting with segmentation and scale (PAWSS) for visual tracking
Tracking-by-detection algorithms are widely used for visual tracking, where
the problem is treated as a classification task where an object model is
updated over time using online learning techniques. In challenging conditions
where an object undergoes deformation or scale variations, the update step is
prone to include background information in the model appearance or to lack the
ability to estimate the scale change, which degrades the performance of the
classifier. In this paper, we incorporate a Patch-based Adaptive Weighting with
Segmentation and Scale (PAWSS) tracking framework that tackles both the scale
and background problems. A simple but effective colour-based segmentation model
is used to suppress background information and multi-scale samples are
extracted to enrich the training pool, which allows the tracker to handle both
incremental and abrupt scale variations between frames. Experimentally, we
evaluate our approach on the online tracking benchmark (OTB) dataset and Visual
Object Tracking (VOT) challenge datasets. The results show that our approach
outperforms recent state-of-the-art trackers, and it especially improves the
successful rate score on the OTB dataset, while on the VOT datasets, PAWSS
ranks among the top trackers while operating at real-time frame rates.Comment: 10 pages, 8 figures. The paper is under consideration at Pattern
Recognition Letter
Fully distributed cooperation for networked uncertain mobile manipulators
This paper investigates a fully distributed cooperation scheme for networked
mobile manipulators. To achieve cooperative task allocation in a distributed
way, an adaptation-based estimation law is established for each robotic agent
to estimate the desired local trajectory. In addition, wrench synthesis is
analyzed in detail to lay a solid foundation for tight cooperation tasks.
Together with the estimated task, a set of distributed adaptive controllers is
proposed to achieve motion synchronization of the mobile manipulator ensemble
over a directed graph with a spanning tree irrespective of the kinematic and
dynamic uncertainties in both the mobile manipulators and the tightly grasped
object. The controlled synchronization alleviates the performance degradation
caused by the estimation/tracking discrepancy during the transient phase. The
proposed scheme requires no persistent excitation condition and avoids the use
of noisy Cartesian-space velocities. Furthermore, it is independent from the
object's center of mass by employing formation-based task allocation and a
task-oriented strategy. These attractive attributes facilitate the practical
application of the scheme. It is theoretically proven that convergence of the
cooperative task tracking error is guaranteed. Simulation results validate the
efficacy and demonstrate the expected performance of the proposed scheme.Comment: 18 pages with 13 figures. Final version with experiment to appear in
IEEE Transactions on Robotic
Detection, Recognition and Tracking of Moving Objects from Real-time Video via SP Theory of Intelligence and Species Inspired PSO
In this paper, we address the basic problem of recognizing moving objects in
video images using SP Theory of Intelligence. The concept of SP Theory of
Intelligence which is a framework of artificial intelligence, was first
introduced by Gerard J Wolff, where S stands for Simplicity and P stands for
Power. Using the concept of multiple alignment, we detect and recognize object
of our interest in video frames with multilevel hierarchical parts and
subparts, based on polythetic categories. We track the recognized objects using
the species based Particle Swarm Optimization (PSO). First, we extract the
multiple alignment of our object of interest from training images. In order to
recognize accurately and handle occlusion, we use the polythetic concepts on
raw data line to omit the redundant noise via searching for best alignment
representing the features from the extracted alignments. We recognize the
domain of interest from the video scenes in form of wide variety of multiple
alignments to handle scene variability. Unsupervised learning is done in the SP
model following the DONSVIC principle and natural structures are discovered via
information compression and pattern analysis. After successful recognition of
objects, we use species based PSO algorithm as the alignments of our object of
interest is analogues to observation likelihood and fitness ability of species.
Subsequently, we analyze the competition and repulsion among species with
annealed Gaussian based PSO. We have tested our algorithms on David, Walking2,
FaceOcc1, Jogging and Dudek, obtaining very satisfactory and competitive
results
- …