28,875 research outputs found
BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks
We present a simple and effective framework for simultaneous semantic
segmentation and instance segmentation with Fully Convolutional Networks
(FCNs). The method, called BiSeg, predicts instance segmentation as a posterior
in Bayesian inference, where semantic segmentation is used as a prior. We
extend the idea of position-sensitive score maps used in recent methods to a
fusion of multiple score maps at different scales and partition modes, and
adopt it as a robust likelihood for instance segmentation inference. As both
Bayesian inference and map fusion are performed per pixel, BiSeg is a fully
convolutional end-to-end solution that inherits all the advantages of FCNs. We
demonstrate state-of-the-art instance segmentation accuracy on PASCAL VOC.Comment: BMVC201
Pseudo Mask Augmented Object Detection
In this work, we present a novel and effective framework to facilitate object
detection with the instance-level segmentation information that is only
supervised by bounding box annotation. Starting from the joint object detection
and instance segmentation network, we propose to recursively estimate the
pseudo ground-truth object masks from the instance-level object segmentation
network training, and then enhance the detection network with top-down
segmentation feedbacks. The pseudo ground truth mask and network parameters are
optimized alternatively to mutually benefit each other. To obtain the promising
pseudo masks in each iteration, we embed a graphical inference that
incorporates the low-level image appearance consistency and the bounding box
annotations to refine the segmentation masks predicted by the segmentation
network. Our approach progressively improves the object detection performance
by incorporating the detailed pixel-wise information learned from the
weakly-supervised segmentation network. Extensive evaluation on the detection
task in PASCAL VOC 2007 and 2012 [12] verifies that the proposed approach is
effective
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
In this work, we tackle the problem of instance segmentation, the task of
simultaneously solving object detection and semantic segmentation. Towards this
goal, we present a model, called MaskLab, which produces three outputs: box
detection, semantic segmentation, and direction prediction. Building on top of
the Faster-RCNN object detector, the predicted boxes provide accurate
localization of object instances. Within each region of interest, MaskLab
performs foreground/background segmentation by combining semantic and direction
prediction. Semantic segmentation assists the model in distinguishing between
objects of different semantic classes including background, while the direction
prediction, estimating each pixel's direction towards its corresponding center,
allows separating instances of the same semantic class. Moreover, we explore
the effect of incorporating recent successful methods from both segmentation
and detection (i.e. atrous convolution and hypercolumn). Our proposed model is
evaluated on the COCO instance segmentation benchmark and shows comparable
performance with other state-of-art models.Comment: 10 pages including referenc
Fully Dynamic Inference with Deep Neural Networks
Modern deep neural networks are powerful and widely applicable models that
extract task-relevant information through multi-level abstraction. Their
cross-domain success, however, is often achieved at the expense of
computational cost, high memory bandwidth, and long inference latency, which
prevents their deployment in resource-constrained and time-sensitive scenarios,
such as edge-side inference and self-driving cars. While recently developed
methods for creating efficient deep neural networks are making their real-world
deployment more feasible by reducing model size, they do not fully exploit
input properties on a per-instance basis to maximize computational efficiency
and task accuracy. In particular, most existing methods typically use a
one-size-fits-all approach that identically processes all inputs. Motivated by
the fact that different images require different feature embeddings to be
accurately classified, we propose a fully dynamic paradigm that imparts deep
convolutional neural networks with hierarchical inference dynamics at the level
of layers and individual convolutional filters/channels. Two compact networks,
called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance
basis which layers or filters/channels are redundant and therefore should be
skipped. L-Net and C-Net also learn how to scale retained computation outputs
to maximize task accuracy. By integrating L-Net and C-Net into a joint design
framework, called LC-Net, we consistently outperform state-of-the-art dynamic
frameworks with respect to both efficiency and classification accuracy. On the
CIFAR-10 dataset, LC-Net results in up to 11.9 fewer floating-point
operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic
inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4
fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods
- …