28 research outputs found
S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation
Real-time understanding in video is crucial in various AI applications such
as autonomous driving. This work presents a fast single-shot segmentation
strategy for video scene understanding. The proposed net, called S3-Net,
quickly locates and segments target sub-scenes, meanwhile extracts structured
time-series semantic features as inputs to an LSTM-based spatio-temporal model.
Utilizing tensorization and quantization techniques, S3-Net is intended to be
lightweight for edge computing. Experiments using CityScapes, UCF11, HMDB51 and
MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy
improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage
reduction of 6.9x and an inference speed of 22.8 FPS on CityScapes with a
GTX1080Ti GPU.Comment: WACV202
The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation
Pursuing a more coherent scene understanding towards real-time vision
applications, single-stage instance segmentation has recently gained
popularity, achieving a simpler and more efficient design than its two-stage
counterparts. Besides, its global mask representation often leads to superior
accuracy to the two-stage Mask R-CNN which has been dominant thus far. Despite
the promising advances in single-stage methods, finer delineation of instance
boundaries still remains unexcavated. Indeed, boundary information provides a
strong shape representation that can operate in synergy with the
fully-convolutional mask features of the single-stage segmenter. In this work,
we propose Boundary Basis based Instance Segmentation(B2Inst) to learn a global
boundary representation that can complement existing global-mask-based methods
that are often lacking high-frequency details. Besides, we devise a unified
quality measure of both mask and boundary and introduce a network block that
learns to score the per-instance predictions of itself. When applied to the
strongest baselines in single-stage instance segmentation, our B2Inst leads to
consistent improvements and accurately parse out the instance boundaries in a
scene. Regardless of being single-stage or two-stage frameworks, we outperform
the existing state-of-the-art methods on the COCO dataset with the same
ResNet-50 and ResNet-101 backbones
The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation
Semantic instance segmentation is the task of simultaneously partitioning an
image into distinct segments while associating each pixel with a class label.
In commonly used pipelines, segmentation and label assignment are solved
separately since joint optimization is computationally expensive. We propose a
greedy algorithm for joint graph partitioning and labeling derived from the
efficient Mutex Watershed partitioning algorithm. It optimizes an objective
function closely related to the Symmetric Multiway Cut objective and
empirically shows efficient scaling behavior. Due to the algorithm's efficiency
it can operate directly on pixels without prior over-segmentation of the image
into superpixels. We evaluate the performance on the Cityscapes dataset (2D
urban scenes) and on a 3D microscopy volume. In urban scenes, the proposed
algorithm combined with current deep neural networks outperforms the strong
baseline of `Panoptic Feature Pyramid Networks' by Kirillov et al. (2019). In
the 3D electron microscopy images, we show explicitly that our joint
formulation outperforms a separate optimization of the partitioning and
labeling problems
Deep Snake for Real-Time Instance Segmentation
This paper introduces a novel contour-based approach named deep snake for
real-time instance segmentation. Unlike some recent methods that directly
regress the coordinates of the object boundary points from an image, deep snake
uses a neural network to iteratively deform an initial contour to match the
object boundary, which implements the classic idea of snake algorithms with a
learning-based approach. For structured feature learning on the contour, we
propose to use circular convolution in deep snake, which better exploits the
cycle-graph structure of a contour compared against generic graph convolution.
Based on deep snake, we develop a two-stage pipeline for instance segmentation:
initial contour proposal and contour deformation, which can handle errors in
object localization. Experiments show that the proposed approach achieves
competitive performances on the Cityscapes, KINS, SBD and COCO datasets while
being efficient for real-time applications with a speed of 32.3 fps for
512512 images on a 1080Ti GPU. The code is available at
https://github.com/zju3dv/snake/.Comment: Accepted to CVPR 2020 as Oral. Add experiments on MS COC
OccuSeg: Occupancy-aware 3D Instance Segmentation
3D instance segmentation, with a variety of applications in robotics and
augmented reality, is in large demands these days. Unlike 2D images that are
projective observations of the environment, 3D models provide metric
reconstruction of the scenes without occlusion or scale ambiguity. In this
paper, we define "3D occupancy size", as the number of voxels occupied by each
instance. It owns advantages of robustness in prediction, on which basis,
OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. Our
multi-task learning produces both occupancy signal and embedding
representations, where the training of spatial and feature embeddings varies
with their difference in scale-aware. Our clustering scheme benefits from the
reliable comparison between the predicted occupancy size and the clustered
occupancy size, which encourages hard samples being correctly clustered and
avoids over segmentation. The proposed approach achieves state-of-the-art
performance on 3 real-world datasets, i.e. ScanNetV2, S3DIS and SceneNN, while
maintaining high efficiency.Comment: CVPR 2020, video this https URL https://youtu.be/co7y6LQ7Kq
SSAP: Single-Shot Instance Segmentation With Affinity Pyramid
Recently, proposal-free instance segmentation has received increasing
attention due to its concise and efficient pipeline. Generally, proposal-free
methods generate instance-agnostic semantic segmentation labels and
instance-aware features to group pixels into different object instances.
However, previous methods mostly employ separate modules for these two
sub-tasks and require multiple passes for inference. We argue that treating
these two sub-tasks separately is suboptimal. In fact, employing multiple
separate modules significantly reduces the potential for application. The
mutual benefits between the two complementary sub-tasks are also unexplored. To
this end, this work proposes a single-shot proposal-free instance segmentation
method that requires only one single pass for prediction. Our method is based
on a pixel-pair affinity pyramid, which computes the probability that two
pixels belong to the same instance in a hierarchical manner. The affinity
pyramid can also be jointly learned with the semantic class labeling and
achieve mutual benefits. Moreover, incorporating with the learned affinity
pyramid, a novel cascaded graph partition module is presented to sequentially
generate instances from coarse to fine. Unlike previous time-consuming graph
partition methods, this module achieves speedup and 9% relative
improvement on Average-Precision (AP). Our approach achieves state-of-the-art
results on the challenging Cityscapes dataset.Comment: ICCV 201
Attention-guided Unified Network for Panoptic Segmentation
This paper studies panoptic segmentation, a recently proposed task which
segments foreground (FG) objects at the instance level as well as background
(BG) contents at the semantic level. Existing methods mostly dealt with these
two problems separately, but in this paper, we reveal the underlying
relationship between them, in particular, FG objects provide complementary cues
to assist BG understanding. Our approach, named the Attention-guided Unified
Network (AUNet), is a unified framework with two branches for FG and BG
segmentation simultaneously. Two sources of attentions are added to the BG
branch, namely, RPN and FG segmentation mask to provide object-level and
pixel-level attentions, respectively. Our approach is generalized to different
backbones with consistent accuracy gain in both FG and BG segmentation, and
also sets new state-of-the-arts both in the MS-COCO (46.5% PQ) and Cityscapes
(59.0% PQ) benchmarks.Comment: CVPR 201
Deep Affinity Net: Instance Segmentation via Affinity
Most of the modern instance segmentation approaches fall into two categories:
region-based approaches in which object bounding boxes are detected first and
later used in cropping and segmenting instances; and keypoint-based approaches
in which individual instances are represented by a set of keypoints followed by
a dense pixel clustering around those keypoints. Despite the maturity of these
two paradigms, we would like to report an alternative affinity-based paradigm
where instances are segmented based on densely predicted affinities and graph
partitioning algorithms. Such affinity-based approaches indicate that
high-level graph features other than regions or keypoints can be directly
applied in the instance segmentation task. In this work, we propose Deep
Affinity Net, an effective affinity-based approach accompanied with a new graph
partitioning algorithm Cascade-GAEC. Without bells and whistles, our end-to-end
model results in 32.4% AP on Cityscapes val and 27.5% AP on test. It achieves
the best single-shot result as well as the fastest running time among all
affinity-based models. It also outperforms the region-based method Mask R-CNN
PolyTransform: Deep Polygon Transformer for Instance Segmentation
In this paper, we propose PolyTransform, a novel instance segmentation
algorithm that produces precise, geometry-preserving masks by combining the
strengths of prevailing segmentation approaches and modern polygon-based
methods. In particular, we first exploit a segmentation network to generate
instance masks. We then convert the masks into a set of polygons that are then
fed to a deforming network that transforms the polygons such that they better
fit the object boundaries. Our experiments on the challenging Cityscapes
dataset show that our PolyTransform significantly improves the performance of
the backbone instance segmentation network and ranks 1st on the Cityscapes
test-set leaderboard. We also show impressive gains in the interactive
annotation setting. We release the code at
https://github.com/uber-research/PolyTransform
Learning Gaussian Instance Segmentation in Point Clouds
This paper presents a novel method for instance segmentation of 3D point
clouds. The proposed method is called Gaussian Instance Center Network (GICN),
which can approximate the distributions of instance centers scattered in the
whole scene as Gaussian center heatmaps. Based on the predicted heatmaps, a
small number of center candidates can be easily selected for the subsequent
predictions with efficiency, including i) predicting the instance size of each
center to decide a range for extracting features, ii) generating bounding boxes
for centers, and iii) producing the final instance masks. GICN is a
single-stage, anchor-free, and end-to-end architecture that is easy to train
and efficient to perform inference. Benefited from the center-dictated
mechanism with adaptive instance size selection, our method achieves
state-of-the-art performance in the task of 3D instance segmentation on ScanNet
and S3DIS datasets