68,074 research outputs found
MLVSNet: Multi-level Voting Siamese Network for 3D visual tracking
Benefiting from the excellent performance of Siamese-based trackers, huge progress on 2D visual tracking has been achieved. However, 3D visual tracking is still under-explored. Inspired by the idea of Hough voting in 3D object detection, in this paper, we propose a Multi-level Voting Siamese Network (MLVSNet) for 3D visual tracking from outdoor point cloud sequences. To deal with sparsity in outdoor 3D point clouds, we propose to perform Hough voting on multi-level features to get more vote centers and retain more useful information, instead of voting only on the fi-nal level feature as in previous methods. We also design an efficient and lightweight Target-Guided Attention (TGA) module to transfer the target information and highlight the target points in the search area. Moreover, we propose a Vote-cluster Feature Enhancement (VFE) module to exploit the relationships between different vote clusters. Extensive experiments on the 3D tracking benchmark of KITTI dataset demonstrate that our MLVSNet outperforms state-of-the-art methods with significant margins. Code will be available at https://github.com/CodeWZT/MLVSNet
2D-3D Pose Tracking with Multi-View Constraints
Camera localization in 3D LiDAR maps has gained increasing attention due to
its promising ability to handle complex scenarios, surpassing the limitations
of visual-only localization methods. However, existing methods mostly focus on
addressing the cross-modal gaps, estimating camera poses frame by frame without
considering the relationship between adjacent frames, which makes the pose
tracking unstable. To alleviate this, we propose to couple the 2D-3D
correspondences between adjacent frames using the 2D-2D feature matching,
establishing the multi-view geometrical constraints for simultaneously
estimating multiple camera poses. Specifically, we propose a new 2D-3D pose
tracking framework, which consists: a front-end hybrid flow estimation network
for consecutive frames and a back-end pose optimization module. We further
design a cross-modal consistency-based loss to incorporate the multi-view
constraints during the training and inference process. We evaluate our proposed
framework on the KITTI and Argoverse datasets. Experimental results demonstrate
its superior performance compared to existing frame-by-frame 2D-3D pose
tracking methods and state-of-the-art vision-only pose tracking algorithms.
More online pose tracking videos are available at
\url{https://youtu.be/yfBRdg7gw5M}Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Discriminative correlation filters (DCF) with deep convolutional features
have achieved favorable performance in recent tracking benchmarks. However,
most of existing DCF trackers only consider appearance features of current
frame, and hardly benefit from motion and inter-frame information. The lack of
temporal information degrades the tracking performance during challenges such
as partial occlusion and deformation. In this work, we focus on making use of
the rich flow information in consecutive frames to improve the feature
representation and the tracking accuracy. Firstly, individual components,
including optical flow estimation, feature extraction, aggregation and
correlation filter tracking are formulated as special layers in network. To the
best of our knowledge, this is the first work to jointly train flow and
tracking task in a deep learning framework. Then the historical feature maps at
predefined intervals are warped and aggregated with current ones by the guiding
of flow. For adaptive aggregation, we propose a novel spatial-temporal
attention mechanism. Extensive experiments are performed on four challenging
tracking datasets: OTB2013, OTB2015, VOT2015 and VOT2016, and the proposed
method achieves superior results on these benchmarks.Comment: Accepted in CVPR 201
- …