2,822 research outputs found
Lidar Panoptic Segmentation and Tracking without Bells and Whistles
State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up
segmentation-centric fashion wherein they build upon semantic segmentation
networks by utilizing clustering to obtain object instances. In this paper, we
re-think this approach and propose a surprisingly simple yet effective
detection-centric network for both LPS and tracking. Our network is modular by
design and optimized for all aspects of both the panoptic segmentation and
tracking task. One of the core components of our network is the object instance
detection branch, which we train using point-level (modal) annotations, as
available in segmentation-centric datasets. In the absence of amodal (cuboid)
annotations, we regress modal centroids and object extent using
trajectory-level supervision that provides information about object size, which
cannot be inferred from single scans due to occlusions and the sparse nature of
the lidar data. We obtain fine-grained instance segments by learning to
associate lidar points with detected centroids. We evaluate our method on
several 3D/4D LPS benchmarks and observe that our model establishes a new
state-of-the-art among open-sourced models, outperforming recent query-based
models.Comment: IROS 2023. Code at https://github.com/abhinavagarwalla/most-lp
Rethinking the competition between detection and ReID in Multi-Object Tracking
Due to balanced accuracy and speed, joint learning detection and ReID-based
one-shot models have drawn great attention in multi-object tracking(MOT).
However, the differences between the above two tasks in the one-shot tracking
paradigm are unconsciously overlooked, leading to inferior performance than the
two-stage methods. In this paper, we dissect the reasoning process of the
aforementioned two tasks. Our analysis reveals that the competition of them
inevitably hurts the learning of task-dependent representations, which further
impedes the tracking performance. To remedy this issue, we propose a novel
cross-correlation network that can effectively impel the separate branches to
learn task-dependent representations. Furthermore, we introduce a scale-aware
attention network that learns discriminative embeddings to improve the ReID
capability. We integrate the delicately designed networks into a one-shot
online MOT system, dubbed CSTrack. Without bells and whistles, our model
achieves new state-of-the-art performances on MOT16 and MOT17. Our code is
released at https://github.com/JudasDie/SOTS
Flow-Guided Feature Aggregation for Video Object Detection
Extending state-of-the-art object detectors from image to video is
challenging. The accuracy of detection suffers from degenerated object
appearances in videos, e.g., motion blur, video defocus, rare poses, etc.
Existing work attempts to exploit temporal information on box level, but such
methods are not trained end-to-end. We present flow-guided feature aggregation,
an accurate and end-to-end learning framework for video object detection. It
leverages temporal coherence on feature level instead. It improves the
per-frame features by aggregation of nearby features along the motion paths,
and thus improves the video recognition accuracy. Our method significantly
improves upon strong single-frame baselines in ImageNet VID, especially for
more challenging fast moving objects. Our framework is principled, and on par
with the best engineered systems winning the ImageNet VID challenges 2016,
without additional bells-and-whistles. The proposed method, together with Deep
Feature Flow, powered the winning entry of ImageNet VID challenges 2017. The
code is available at
https://github.com/msracver/Flow-Guided-Feature-Aggregation
- …