17 research outputs found
Superpixel Convolutional Networks using Bilateral Inceptions
In this paper we propose a CNN architecture for semantic image segmentation.
We introduce a new 'bilateral inception' module that can be inserted in
existing CNN architectures and performs bilateral filtering, at multiple
feature-scales, between superpixels in an image. The feature spaces for
bilateral filtering and other parameters of the module are learned end-to-end
using standard backpropagation techniques. The bilateral inception module
addresses two issues that arise with general CNN segmentation architectures.
First, this module propagates information between (super) pixels while
respecting image edges, thus using the structured information of the problem
for improved results. Second, the layer recovers a full resolution segmentation
result from the lower resolution solution of a CNN. In the experiments, we
modify several existing CNN architectures by inserting our inception module
between the last CNN (1x1 convolution) layers. Empirical results on three
different datasets show reliable improvements not only in comparison to the
baseline networks, but also in comparison to several dense-pixel prediction
techniques such as CRFs, while being competitive in time.Comment: European Conference on Computer Vision (ECCV), 201
Semantic Object Parsing with Graph LSTM
By taking the semantic object parsing task as an exemplar application
scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network,
which is the generalization of LSTM from sequential data or multi-dimensional
data to general graph-structured data. Particularly, instead of evenly and
fixedly dividing an image to pixels or patches in existing multi-dimensional
LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each
arbitrary-shaped superpixel as a semantically consistent node, and adaptively
construct an undirected graph for each image, where the spatial relations of
the superpixels are naturally used as edges. Constructed on such an adaptive
graph topology, the Graph LSTM is more naturally aligned with the visual
patterns in the image (e.g., object boundaries or appearance similarities) and
provides a more economical information propagation route. Furthermore, for each
optimization step over Graph LSTM, we propose to use a confidence-driven scheme
to update the hidden and memory states of nodes progressively till all nodes
are updated. In addition, for each node, the forgets gates are adaptively
learned to capture different degrees of semantic correlation with neighboring
nodes. Comprehensive evaluations on four diverse semantic object parsing
datasets well demonstrate the significant superiority of our Graph LSTM over
other state-of-the-art solutions.Comment: 18 page
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
A Multi-Level Approach to Waste Object Segmentation
We address the problem of localizing waste objects from a color image and an
optional depth image, which is a key perception component for robotic
interaction with such objects. Specifically, our method integrates the
intensity and depth information at multiple levels of spatial granularity.
Firstly, a scene-level deep network produces an initial coarse segmentation,
based on which we select a few potential object regions to zoom in and perform
fine segmentation. The results of the above steps are further integrated into a
densely connected conditional random field that learns to respect the
appearance, depth, and spatial affinities with pixel-level accuracy. In
addition, we create a new RGBD waste object segmentation dataset, MJU-Waste,
that is made public to facilitate future research in this area. The efficacy of
our method is validated on both MJU-Waste and the Trash Annotation in Context
(TACO) dataset.Comment: Paper appears in Sensors 2020, 20(14), 381