1,646 research outputs found
G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data
Monocular depth inference is a fundamental problem for scene perception of
robots. Specific robots may be equipped with a camera plus an optional depth
sensor of any type and located in various scenes of different scales, whereas
recent advances derived multiple individual sub-tasks. It leads to additional
burdens to fine-tune models for specific robots and thereby high-cost
customization in large-scale industrialization. This paper investigates a
unified task of monocular depth inference, which infers high-quality depth maps
from all kinds of input raw data from various robots in unseen scenes. A basic
benchmark G2-MonoDepth is developed for this task, which comprises four
components: (a) a unified data representation RGB+X to accommodate RGB plus raw
depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and
errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth
sparsity/errors of input raw data and diverse scales of output scenes, (c) an
improved network to well propagate diverse scene scales from input to output,
and (d) a data augmentation pipeline to simulate all types of real artifacts in
raw depth maps for training. G2-MonoDepth is applied in three sub-tasks
including depth estimation, depth completion with different sparsity, and depth
enhancement in unseen scenes, and it always outperforms SOTA baselines on both
real-world data and synthetic data.Comment: 18 pages, 16 figure
Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning
Depth-aware panoptic segmentation is an emerging topic in computer vision
which combines semantic and geometric understanding for more robust scene
interpretation. Recent works pursue unified frameworks to tackle this challenge
but mostly still treat it as two individual learning tasks, which limits their
potential for exploring cross-domain information. We propose a deeply unified
framework for depth-aware panoptic segmentation, which performs joint
segmentation and depth estimation both in a per-segment manner with identical
object queries. To narrow the gap between the two tasks, we further design a
geometric query enhancement method, which is able to integrate scene geometry
into object queries using latent representations. In addition, we propose a
bi-directional guidance learning approach to facilitate cross-task feature
learning by taking advantage of their mutual relations. Our method sets the new
state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS
and SemKITTI-DVPS datasets. Moreover, our guidance learning approach is shown
to deliver performance improvement even under incomplete supervision labels.Comment: to be published in ICCV 202
- …