462 research outputs found
SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
In this paper, we present a new sequence-to-sequence learning framework for
visual tracking, dubbed SeqTrack. It casts visual tracking as a sequence
generation problem, which predicts object bounding boxes in an autoregressive
fashion. This is different from prior Siamese trackers and transformer
trackers, which rely on designing complicated head networks, such as
classification and regression heads. SeqTrack only adopts a simple
encoder-decoder transformer architecture. The encoder extracts visual features
with a bidirectional transformer, while the decoder generates a sequence of
bounding box values autoregressively with a causal transformer. The loss
function is a plain cross-entropy. Such a sequence learning paradigm not only
simplifies tracking framework, but also achieves competitive performance on
benchmarks. For instance, SeqTrack gets 72.5% AUC on LaSOT, establishing a new
state-of-the-art performance. Code and models are available at here.Comment: CVPR2023 pape
CiteTracker: Correlating Image and Text for Visual Tracking
Existing visual tracking methods typically take an image patch as the
reference of the target to perform tracking. However, a single image patch
cannot provide a complete and precise concept of the target object as images
are limited in their ability to abstract and can be ambiguous, which makes it
difficult to track targets with drastic variations. In this paper, we propose
the CiteTracker to enhance target modeling and inference in visual tracking by
connecting images and text. Specifically, we develop a text generation module
to convert the target image patch into a descriptive text containing its class
and attribute information, providing a comprehensive reference point for the
target. In addition, a dynamic description module is designed to adapt to
target variations for more effective target representation. We then associate
the target description and the search image using an attention-based
correlation module to generate the correlated features for target state
reference. Extensive experiments on five diverse datasets are conducted to
evaluate the proposed algorithm and the favorable performance against the
state-of-the-art methods demonstrates the effectiveness of the proposed
tracking method.Comment: accepted by ICCV 202
Learning to Segment Dynamic Objects using SLAM Outliers
We present a method to automatically learn to segment dynamic objects using
SLAM outliers. It requires only one monocular sequence per dynamic object for
training and consists in localizing dynamic objects using SLAM outliers,
creating their masks, and using these masks to train a semantic segmentation
network. We integrate the trained network in ORB-SLAM 2 and LDSO. At runtime we
remove features on dynamic objects, making the SLAM unaffected by them. We also
propose a new stereo dataset and new metrics to evaluate SLAM robustness. Our
dataset includes consensus inversions, i.e., situations where the SLAM uses
more features on dynamic objects that on the static background. Consensus
inversions are challenging for SLAM as they may cause major SLAM failures. Our
approach performs better than the State-of-the-Art on the TUM RGB-D dataset in
monocular mode and on our dataset in both monocular and stereo modes.Comment: Accepted to ICPR 202
- …