8,180 research outputs found
Learning Spatial-Aware Regressions for Visual Tracking
In this paper, we analyze the spatial information of deep features, and
propose two complementary regressions for robust visual tracking. First, we
propose a kernelized ridge regression model wherein the kernel value is defined
as the weighted sum of similarity scores of all pairs of patches between two
samples. We show that this model can be formulated as a neural network and thus
can be efficiently solved. Second, we propose a fully convolutional neural
network with spatially regularized kernels, through which the filter kernel
corresponding to each output channel is forced to focus on a specific region of
the target. Distance transform pooling is further exploited to determine the
effectiveness of each output channel of the convolution layer. The outputs from
the kernelized ridge regression model and the fully convolutional neural
network are combined to obtain the ultimate response. Experimental results on
two benchmark datasets validate the effectiveness of the proposed method.Comment: To appear in CVPR201
Understanding and Diagnosing Visual Tracking Systems
Several benchmark datasets for visual tracking research have been proposed in
recent years. Despite their usefulness, whether they are sufficient for
understanding and diagnosing the strengths and weaknesses of different trackers
remains questionable. To address this issue, we propose a framework by breaking
a tracker down into five constituent parts, namely, motion model, feature
extractor, observation model, model updater, and ensemble post-processor. We
then conduct ablative experiments on each component to study how it affects the
overall result. Surprisingly, our findings are discrepant with some common
beliefs in the visual tracking research community. We find that the feature
extractor plays the most important role in a tracker. On the other hand,
although the observation model is the focus of many studies, we find that it
often brings no significant improvement. Moreover, the motion model and model
updater contain many details that could affect the result. Also, the ensemble
post-processor can improve the result substantially when the constituent
trackers have high diversity. Based on our findings, we put together some very
elementary building blocks to give a basic tracker which is competitive in
performance to the state-of-the-art trackers. We believe our framework can
provide a solid baseline when conducting controlled experiments for visual
tracking research
- …