151,049 research outputs found
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
In this work, we tackle the problem of instance segmentation, the task of
simultaneously solving object detection and semantic segmentation. Towards this
goal, we present a model, called MaskLab, which produces three outputs: box
detection, semantic segmentation, and direction prediction. Building on top of
the Faster-RCNN object detector, the predicted boxes provide accurate
localization of object instances. Within each region of interest, MaskLab
performs foreground/background segmentation by combining semantic and direction
prediction. Semantic segmentation assists the model in distinguishing between
objects of different semantic classes including background, while the direction
prediction, estimating each pixel's direction towards its corresponding center,
allows separating instances of the same semantic class. Moreover, we explore
the effect of incorporating recent successful methods from both segmentation
and detection (i.e. atrous convolution and hypercolumn). Our proposed model is
evaluated on the COCO instance segmentation benchmark and shows comparable
performance with other state-of-art models.Comment: 10 pages including referenc
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
Transferable Semi-supervised Semantic Segmentation
The performance of deep learning based semantic segmentation models heavily
depends on sufficient data with careful annotations. However, even the largest
public datasets only provide samples with pixel-level annotations for rather
limited semantic categories. Such data scarcity critically limits scalability
and applicability of semantic segmentation models in real applications. In this
paper, we propose a novel transferable semi-supervised semantic segmentation
model that can transfer the learned segmentation knowledge from a few strong
categories with pixel-level annotations to unseen weak categories with only
image-level annotations, significantly broadening the applicable territory of
deep segmentation models. In particular, the proposed model consists of two
complementary and learnable components: a Label transfer Network (L-Net) and a
Prediction transfer Network (P-Net). The L-Net learns to transfer the
segmentation knowledge from strong categories to the images in the weak
categories and produces coarse pixel-level semantic maps, by effectively
exploiting the similar appearance shared across categories. Meanwhile, the
P-Net tailors the transferred knowledge through a carefully designed
adversarial learning strategy and produces refined segmentation results with
better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4%
performance of the fully-supervised baseline using 50% and 0% categories with
pixel-level annotations respectively on PASCAL VOC 2012. With such a novel
transfer mechanism, our proposed model is easily generalizable to a variety of
new categories, only requiring image-level annotations, and offers appealing
scalability in real applications.Comment: Minor update of arXiv:1711.0682
- …
