6,905 research outputs found
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
We focus on the challenging task of real-time semantic segmentation in this
paper. It finds many practical applications and yet is with fundamental
difficulty of reducing a large portion of computation for pixel-wise label
inference. We propose an image cascade network (ICNet) that incorporates
multi-resolution branches under proper label guidance to address this
challenge. We provide in-depth analysis of our framework and introduce the
cascade feature fusion unit to quickly achieve high-quality segmentation. Our
system yields real-time inference on a single GPU card with decent quality
results evaluated on challenging datasets like Cityscapes, CamVid and
COCO-Stuff.Comment: ECCV 201
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Semantic segmentation, like other fields of computer vision, has seen a
remarkable performance advance by the use of deep convolution neural networks.
However, considering that neighboring pixels are heavily dependent on each
other, both learning and testing of these methods have a lot of redundant
operations. To resolve this problem, the proposed network is trained and tested
with only 0.37% of total pixels by superpixel-based sampling and largely
reduced the complexity of upsampling calculation. The hypercolumn feature maps
are constructed by pyramid module in combination with the convolution layers of
the base network. Since the proposed method uses a very small number of sampled
pixels, the end-to-end learning of the entire network is difficult with a
common learning rate for all the layers. In order to resolve this problem, the
learning rate after sampling is controlled by statistical process control (SPC)
of gradients in each layer. The proposed method performs better than or equal
to the conventional methods that use much more samples on Pascal Context,
SUN-RGBD dataset.Comment: Accepted in British Machine Vision Conference (BMVC), 201
Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation
Semantic segmentation is essentially important to biomedical image analysis.
Many recent works mainly focus on integrating the Fully Convolutional Network
(FCN) architecture with sophisticated convolution implementation and deep
supervision. In this paper, we propose to decompose the single segmentation
task into three subsequent sub-tasks, including (1) pixel-wise image
segmentation, (2) prediction of the class labels of the objects within the
image, and (3) classification of the scene the image belonging to. While these
three sub-tasks are trained to optimize their individual loss functions of
different perceptual levels, we propose to let them interact by the task-task
context ensemble. Moreover, we propose a novel sync-regularization to penalize
the deviation between the outputs of the pixel-wise segmentation and the class
prediction tasks. These effective regularizations help FCN utilize context
information comprehensively and attain accurate semantic segmentation, even
though the number of the images for training may be limited in many biomedical
applications. We have successfully applied our framework to three diverse 2D/3D
medical image datasets, including Robotic Scene Segmentation Challenge 18
(ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus
Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all
three challenges.Comment: IEEE Transactions on Medical Imagin
MobileNetV2: Inverted Residuals and Linear Bottlenecks
In this paper we describe a new mobile architecture, MobileNetV2, that
improves the state of the art performance of mobile models on multiple tasks
and benchmarks as well as across a spectrum of different model sizes. We also
describe efficient ways of applying these mobile models to object detection in
a novel framework we call SSDLite. Additionally, we demonstrate how to build
mobile semantic segmentation models through a reduced form of DeepLabv3 which
we call Mobile DeepLabv3.
The MobileNetV2 architecture is based on an inverted residual structure where
the input and output of the residual block are thin bottleneck layers opposite
to traditional residual models which use expanded representations in the input
an MobileNetV2 uses lightweight depthwise convolutions to filter features in
the intermediate expansion layer. Additionally, we find that it is important to
remove non-linearities in the narrow layers in order to maintain
representational power. We demonstrate that this improves performance and
provide an intuition that led to this design. Finally, our approach allows
decoupling of the input/output domains from the expressiveness of the
transformation, which provides a convenient framework for further analysis. We
measure our performance on Imagenet classification, COCO object detection, VOC
image segmentation. We evaluate the trade-offs between accuracy, and number of
operations measured by multiply-adds (MAdd), as well as the number of
parameter
- …