18,228 research outputs found
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation
We introduce a new loss function for the weakly-supervised training of
semantic image segmentation models based on three guiding principles: to seed
with weak localization cues, to expand objects based on the information about
which classes can occur in an image, and to constrain the segmentations to
coincide with object boundaries. We show experimentally that training a deep
convolutional neural network using the proposed loss function leads to
substantially better segmentations than previous state-of-the-art methods on
the challenging PASCAL VOC 2012 dataset. We furthermore give insight into the
working mechanism of our method by a detailed experimental study that
illustrates how the segmentation quality is affected by each term of the
proposed loss function as well as their combinations.Comment: ECCV 201
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
Superpixel-based Semantic Segmentation Trained by Statistical Process Control
Semantic segmentation, like other fields of computer vision, has seen a
remarkable performance advance by the use of deep convolution neural networks.
However, considering that neighboring pixels are heavily dependent on each
other, both learning and testing of these methods have a lot of redundant
operations. To resolve this problem, the proposed network is trained and tested
with only 0.37% of total pixels by superpixel-based sampling and largely
reduced the complexity of upsampling calculation. The hypercolumn feature maps
are constructed by pyramid module in combination with the convolution layers of
the base network. Since the proposed method uses a very small number of sampled
pixels, the end-to-end learning of the entire network is difficult with a
common learning rate for all the layers. In order to resolve this problem, the
learning rate after sampling is controlled by statistical process control (SPC)
of gradients in each layer. The proposed method performs better than or equal
to the conventional methods that use much more samples on Pascal Context,
SUN-RGBD dataset.Comment: Accepted in British Machine Vision Conference (BMVC), 201
DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION
Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition
models achieved remarkable performance recently and they even surpass human???s ability to recognize
objects, but semantic segmentation models are still behind. One of the reason that makes semantic
segmentation relatively a hard problem is the image understanding at pixel level by considering global
context as oppose to object recognition. One other challenge is transferring the knowledge of an object
recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the
main challenges we faced approaching semantic image segmentation with machine learning algorithms.
Our main focus was how we can use deep learning algorithms for this task since they require the
least amount of feature engineering and also it was shown that such models can be applied to large scale
datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional
neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep
fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes
training of deep models feasible which ultimately leads to having a rich powerful visual representation.
Our model also benefits from skip-connections which ease the propagation of information from the
encoder module to the decoder module. This would enable our model to have less parameters in the
decoder module while it also achieves better performance. We also benchmarked the effective variation
of the proposed model on a semantic segmentation benchmark.
We first make a thorough review of current high-performance models and the problems one might
face when trying to replicate such models which mainly arose from the lack of sufficient provided
information. Then, we describe our own novel method which we called deep fully residual convolutional
network (DFRCN). We showed that our method exhibits state of the art performance on a challenging
benchmark for aerial image segmentation.clos
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Object detection via a multi-region & semantic segmentation-aware CNN model
We propose an object detection system that relies on a multi-region deep
convolutional neural network (CNN) that also encodes semantic
segmentation-aware features. The resulting CNN-based representation aims at
capturing a diverse set of discriminative appearance factors and exhibits
localization sensitivity that is essential for accurate object localization. We
exploit the above properties of our recognition module by integrating it on an
iterative localization mechanism that alternates between scoring a box proposal
and refining its location with a deep CNN regression model. Thanks to the
efficient use of our modules, we detect objects with very high localization
accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we
achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published
work by a significant margin.Comment: Extended technical report -- short version to appear at ICCV 201
- âŠ