1,218 research outputs found
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
Deformable Convolutional Networks
Convolutional neural networks (CNNs) are inherently limited to model
geometric transformations due to the fixed geometric structures in its building
modules. In this work, we introduce two new modules to enhance the
transformation modeling capacity of CNNs, namely, deformable convolution and
deformable RoI pooling. Both are based on the idea of augmenting the spatial
sampling locations in the modules with additional offsets and learning the
offsets from target tasks, without additional supervision. The new modules can
readily replace their plain counterparts in existing CNNs and can be easily
trained end-to-end by standard back-propagation, giving rise to deformable
convolutional networks. Extensive experiments validate the effectiveness of our
approach on sophisticated vision tasks of object detection and semantic
segmentation. The code would be released
- …