1,397 research outputs found
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
A Multi-Level Approach to Waste Object Segmentation
We address the problem of localizing waste objects from a color image and an
optional depth image, which is a key perception component for robotic
interaction with such objects. Specifically, our method integrates the
intensity and depth information at multiple levels of spatial granularity.
Firstly, a scene-level deep network produces an initial coarse segmentation,
based on which we select a few potential object regions to zoom in and perform
fine segmentation. The results of the above steps are further integrated into a
densely connected conditional random field that learns to respect the
appearance, depth, and spatial affinities with pixel-level accuracy. In
addition, we create a new RGBD waste object segmentation dataset, MJU-Waste,
that is made public to facilitate future research in this area. The efficacy of
our method is validated on both MJU-Waste and the Trash Annotation in Context
(TACO) dataset.Comment: Paper appears in Sensors 2020, 20(14), 381
PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation
The Depth-aware Video Panoptic Segmentation (DVPS) is a new challenging
vision problem that aims to predict panoptic segmentation and depth in a video
simultaneously. The previous work solves this task by extending the existing
panoptic segmentation method with an extra dense depth prediction and instance
tracking head. However, the relationship between the depth and panoptic
segmentation is not well explored -- simply combining existing methods leads to
competition and needs carefully weight balancing. In this paper, we present
PolyphonicFormer, a vision transformer to unify these sub-tasks under the DVPS
task and lead to more robust results. Our principal insight is that the depth
can be harmonized with the panoptic segmentation with our proposed new paradigm
of predicting instance level depth maps with object queries. Then the
relationship between the two tasks via query-based learning is explored. From
the experiments, we demonstrate the benefits of our design from both depth
estimation and panoptic segmentation aspects. Since each thing query also
encodes the instance-wise information, it is natural to perform tracking
directly with appearance learning. Our method achieves state-of-the-art results
on two DVPS datasets (Semantic KITTI, Cityscapes), and ranks 1st on the
ICCV-2021 BMTT Challenge video + depth track. Code is available at
https://github.com/HarborYuan/PolyphonicFormer .Comment: Accepted by ECCV 202
Boosting Semantic Segmentation with Semantic Boundaries
In this paper, we present the Semantic Boundary Conditioned Backbone (SBCB)
framework, a simple yet effective training framework that is model-agnostic and
boosts segmentation performance, especially around the boundaries. Motivated by
the recent development in improving semantic segmentation by incorporating
boundaries as auxiliary tasks, we propose a multi-task framework that uses
semantic boundary detection (SBD) as an auxiliary task. The SBCB framework
utilizes the nature of the SBD task, which is complementary to semantic
segmentation, to improve the backbone of the segmentation head. We apply an SBD
head that exploits the multi-scale features from the backbone, where the model
learns low-level features in the earlier stages, and high-level semantic
understanding in the later stages. This head perfectly complements the common
semantic segmentation architectures where the features from the later stages
are used for classification. We can improve semantic segmentation models
without additional parameters during inference by only conditioning the
backbone. Through extensive evaluations, we show the effectiveness of the SBCB
framework by improving various popular segmentation heads and backbones by 0.5%
~ 3.0% IoU on the Cityscapes dataset and gains 1.6% ~ 4.1% in boundary Fscores.
We also apply this framework on customized backbones and the emerging vision
transformer models and show the effectiveness of the SBCB framework.Comment: 28 pages, Code available at
https://github.com/haruishi43/boundary_boost_mmse
- …