362 research outputs found
GlobalTrack: A Simple and Strong Baseline for Long-term Tracking
A key capability of a long-term tracker is to search for targets in very
large areas (typically the entire image) to handle possible target absences or
tracking failures. However, currently there is a lack of such a strong baseline
for global instance search. In this work, we aim to bridge this gap.
Specifically, we propose GlobalTrack, a pure global instance search based
tracker that makes no assumption on the temporal consistency of the target's
positions and scales. GlobalTrack is developed based on two-stage object
detectors, and it is able to perform full-image and multi-scale search of
arbitrary instances with only a single query as the guide. We further propose a
cross-query loss to improve the robustness of our approach against distractors.
With no online learning, no punishment on position or scale changes, no scale
smoothing and no trajectory refinement, our pure global instance search based
tracker achieves comparable, sometimes much better performance on four
large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate
on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet),
compared to state-of-the-art approaches that typically require complex
post-processing. More importantly, our tracker runs without cumulative errors,
i.e., any type of temporary tracking failures will not affect its performance
on future frames, making it ideal for long-term tracking. We hope this work
will be a strong baseline for long-term tracking and will stimulate future
works in this area. Code is available at
https://github.com/huanglianghua/GlobalTrack.Comment: Accepted in AAAI202
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation
We address the problem of semi-supervised video object segmentation (VOS),
where the masks of objects of interests are given in the first frame of an
input video. To deal with challenging cases where objects are occluded or
missing, previous work relies on greedy data association strategies that make
decisions for each frame individually. In this paper, we propose a novel
approach to defer the decision making for a target object in each frame, until
a global view can be established with the entire video being taken into
consideration. Our approach is in the same spirit as Multiple Hypotheses
Tracking (MHT) methods, making several critical adaptations for the VOS
problem. We employ the bounding box (bbox) hypothesis for tracking tree
formation, and the multiple hypotheses are spawned by propagating the preceding
bbox into the detected bbox proposals within a gated region starting from the
initial object mask in the first frame. The gated region is determined by a
gating scheme which takes into account a more comprehensive motion model rather
than the simple Kalman filtering model in traditional MHT. To further design
more customized algorithms tailored for VOS, we develop a novel mask
propagation score instead of the appearance similarity score that could be
brittle due to large deformations. The mask propagation score, together with
the motion score, determines the affinity between the hypotheses during tree
pruning. Finally, a novel mask merging strategy is employed to handle mask
conflicts between objects. Extensive experiments on challenging datasets
demonstrate the effectiveness of the proposed method, especially in the case of
object missing.Comment: accepted to CVPR 2019 as oral presentatio
Quadcopter drone formation control via onboard visual perception
Quadcopter drone formation control is an important capability for fields like area surveillance, search and rescue, agriculture, and reconnaissance. Of particular interest is formation control in environments where radio communications and/or GPS may be either denied or not sufficiently accurate for the desired application.
To address this, we focus on vision as the sensing modality. We train an Hourglass Convolutional Neural Network (CNN) to discriminate between quadcopter pixels and non-quadcopter pixels in a live video feed and use it to guide a formation of quadcopters. The CNN outputs "heatmaps" - pixel-by-pixel likelihood estimates of the presence of a quadcopter. These heatmaps suffer from short-lived false detections. To mitigate these, we apply a version of the Siamese networks technique on consecutive frames for clutter mitigation and to promote temporal smoothness in the heatmaps. The heatmaps give an estimate of the range and bearing to the other quadcopter(s), which we use to calculate flight control commands and maintain the desired formation.
We implement the algorithm on a single-board computer (ODROID XU4) with a standard webcam mounted to a quadcopter drone. Flight tests in a motion capture volume demonstrate successful formation control with two quadcopters in a leader-follower setup
- …