23,738 research outputs found
X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction
Segmentation of planar regions from a single RGB image is a particularly
important task in the perception of complex scenes. To utilize both visual and
geometric properties in images, recent approaches often formulate the problem
as a joint estimation of planar instances and dense depth through feature
fusion mechanisms and geometric constraint losses. Despite promising results,
these methods do not consider cross-task feature distillation and perform
poorly in boundary regions. To overcome these limitations, we propose X-PDNet,
a framework for the multitask learning of plane instance segmentation and depth
estimation with improvements in the following two aspects. Firstly, we
construct the cross-task distillation design which promotes early information
sharing between dual-tasks for specific task improvements. Secondly, we
highlight the current limitations of using the ground truth boundary to develop
boundary regression loss, and propose a novel method that exploits depth
information to support precise boundary region segmentation. Finally, we
manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset
and make available for evaluation of plane instance segmentation. Through the
experiments, our proposed methods prove the advantages, outperforming the
baseline with large improvement margins in the quantitative results on the
ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of
our proposals.Comment: Accepted to BMVC 202
DISC: Deep Image Saliency Computing via Progressive Representation Learning
Salient object detection increasingly receives attention as an important
component or step in several pattern recognition and image processing tasks.
Although a variety of powerful saliency models have been intensively proposed,
they usually involve heavy feature (or model) engineering based on priors (or
assumptions) about the properties of objects and backgrounds. Inspired by the
effectiveness of recently developed feature learning, we provide a novel Deep
Image Saliency Computing (DISC) framework for fine-grained image saliency
computing. In particular, we model the image saliency from both the coarse- and
fine-level observations, and utilize the deep convolutional neural network
(CNN) to learn the saliency representation in a progressive manner.
Specifically, our saliency model is built upon two stacked CNNs. The first CNN
generates a coarse-level saliency map by taking the overall image as the input,
roughly identifying saliency regions in the global context. Furthermore, we
integrate superpixel-based local context information in the first CNN to refine
the coarse-level saliency map. Guided by the coarse saliency map, the second
CNN focuses on the local context to produce fine-grained and accurate saliency
map while preserving object details. For a testing image, the two CNNs
collaboratively conduct the saliency computing in one shot. Our DISC framework
is capable of uniformly highlighting the objects-of-interest from complex
background while preserving well object details. Extensive experiments on
several standard benchmarks suggest that DISC outperforms other
state-of-the-art methods and it also generalizes well across datasets without
additional training. The executable version of DISC is available online:
http://vision.sysu.edu.cn/projects/DISC.Comment: This manuscript is the accepted version for IEEE Transactions on
Neural Networks and Learning Systems (T-NNLS), 201
- …