539 research outputs found
Learning Spatial-Aware Regressions for Visual Tracking
In this paper, we analyze the spatial information of deep features, and
propose two complementary regressions for robust visual tracking. First, we
propose a kernelized ridge regression model wherein the kernel value is defined
as the weighted sum of similarity scores of all pairs of patches between two
samples. We show that this model can be formulated as a neural network and thus
can be efficiently solved. Second, we propose a fully convolutional neural
network with spatially regularized kernels, through which the filter kernel
corresponding to each output channel is forced to focus on a specific region of
the target. Distance transform pooling is further exploited to determine the
effectiveness of each output channel of the convolution layer. The outputs from
the kernelized ridge regression model and the fully convolutional neural
network are combined to obtain the ultimate response. Experimental results on
two benchmark datasets validate the effectiveness of the proposed method.Comment: To appear in CVPR201
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
- …