63,140 research outputs found
Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey
From the autonomous car driving to medical diagnosis, the requirement of the
task of image segmentation is everywhere. Segmentation of an image is one of
the indispensable tasks in computer vision. This task is comparatively
complicated than other vision tasks as it needs low-level spatial information.
Basically, image segmentation can be of two types: semantic segmentation and
instance segmentation. The combined version of these two basic tasks is known
as panoptic segmentation. In the recent era, the success of deep convolutional
neural networks (CNN) has influenced the field of segmentation greatly and gave
us various successful models to date. In this survey, we are going to take a
glance at the evolution of both semantic and instance segmentation work based
on CNN. We have also specified comparative architectural details of some
state-of-the-art models and discuss their training details to present a lucid
understanding of hyper-parameter tuning of those models. We have also drawn a
comparison among the performance of those models on different datasets. Lastly,
we have given a glimpse of some state-of-the-art panoptic segmentation models.Comment: 38 pages, 29 figures, 8 table
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
PointIT: A Fast Tracking Framework Based on 3D Instance Segmentation
Recently most popular tracking frameworks focus on 2D image sequences. They
seldom track the 3D object in point clouds. In this paper, we propose PointIT,
a fast, simple tracking method based on 3D on-road instance segmentation.
Firstly, we transform 3D LiDAR data into the spherical image with the size of
64 x 512 x 4 and feed it into instance segment model to get the predicted
instance mask for each class. Then we use MobileNet as our primary encoder
instead of the original ResNet to reduce the computational complexity. Finally,
we extend the Sort algorithm with this instance framework to realize tracking
in the 3D LiDAR point cloud data. The model is trained on the spherical images
dataset with the corresponding instance label masks which are provided by KITTI
3D Object Track dataset. According to the experiment results, our network can
achieve on Average Precision (AP) of 0.617 and the performance of
multi-tracking task has also been improved
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
Deep Convolutional Neural Networks (DCNNs) have recently shown state of the
art performance in high level vision tasks, such as image classification and
object detection. This work brings together methods from DCNNs and
probabilistic graphical models for addressing the task of pixel-level
classification (also called "semantic image segmentation"). We show that
responses at the final layer of DCNNs are not sufficiently localized for
accurate object segmentation. This is due to the very invariance properties
that make DCNNs good for high level tasks. We overcome this poor localization
property of deep networks by combining the responses at the final DCNN layer
with a fully connected Conditional Random Field (CRF). Qualitatively, our
"DeepLab" system is able to localize segment boundaries at a level of accuracy
which is beyond previous methods. Quantitatively, our method sets the new
state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching
71.6% IOU accuracy in the test set. We show how these results can be obtained
efficiently: Careful network re-purposing and a novel application of the 'hole'
algorithm from the wavelet community allow dense computation of neural net
responses at 8 frames per second on a modern GPU.Comment: 14 pages. Updated related wor
Context Tricks for Cheap Semantic Segmentation
Accurate semantic labeling of image pixels is difficult because intra-class
variability is often greater than inter-class variability. In turn, fast
semantic segmentation is hard because accurate models are usually too
complicated to also run quickly at test-time. Our experience with building and
running semantic segmentation systems has also shown a reasonably obvious
bottleneck on model complexity, imposed by small training datasets. We
therefore propose two simple complementary strategies that leverage context to
give better semantic segmentation, while scaling up or down to train on
different-sized datasets.
As easy modifications for existing semantic segmentation algorithms, we
introduce Decorrelated Semantic Texton Forests, and the Context Sensitive Image
Level Prior. The proposed modifications are tested using a Semantic Texton
Forest (STF) system, and the modifications are validated on two standard
benchmark datasets, MSRC-21 and PascalVOC-2010. In Python based comparisons,
our system is insignificantly slower than STF at test-time, yet produces
superior semantic segmentations overall, with just push-button training.Comment: Supplementary material can be found at
http://www0.cs.ucl.ac.uk/staff/T.Intharah/research.htm
Segmentation of Objects by Hashing
We propose a novel approach to address the problem of Simultaneous Detection
and Segmentation introduced in [Hariharan et al 2014]. Using the hierarchical
structures first presented in [Arbel\'aez et al 2011] we use an efficient and
accurate procedure that exploits the feature information of the hierarchy using
Locality Sensitive Hashing. We build on recent work that utilizes convolutional
neural networks to detect bounding boxes in an image [Ren et al 2015] and then
use the top similar hierarchical region that best fits each bounding box after
hashing, we call this approach C&Z Segmentation. We then refine our final
segmentation results by automatic hierarchical pruning. C&Z Segmentation
introduces a train-free alternative to Hypercolumns [Hariharan et al 2015]. We
conduct extensive experiments on PASCAL VOC 2012 segmentation dataset, showing
that C&Z gives competitive state-of-the-art segmentations of objects
Unsupervised learning of foreground object detection
Unsupervised learning poses one of the most difficult challenges in computer
vision today. The task has an immense practical value with many applications in
artificial intelligence and emerging technologies, as large quantities of
unlabeled videos can be collected at relatively low cost. In this paper, we
address the unsupervised learning problem in the context of detecting the main
foreground objects in single images. We train a student deep network to predict
the output of a teacher pathway that performs unsupervised object discovery in
videos or large image collections. Our approach is different from published
methods on unsupervised object discovery. We move the unsupervised learning
phase during training time, then at test time we apply the standard
feed-forward processing along the student pathway. This strategy has the
benefit of allowing increased generalization possibilities during training,
while remaining fast at testing. Our unsupervised learning algorithm can run
over several generations of student-teacher training. Thus, a group of student
networks trained in the first generation collectively create the teacher at the
next generation. In experiments our method achieves top results on three
current datasets for object discovery in video, unsupervised image segmentation
and saliency detection. At test time the proposed system is fast, being one to
two orders of magnitude faster than published unsupervised methods.Comment: International Journal of Computer Vision (IJCV), 201
Instance-aware Semantic Segmentation via Multi-task Network Cascades
Semantic segmentation research has recently witnessed rapid progress, but
many leading methods are unable to identify object instances. In this paper, we
present Multi-task Network Cascades for instance-aware semantic segmentation.
Our model consists of three networks, respectively differentiating instances,
estimating masks, and categorizing objects. These networks form a cascaded
structure, and are designed to share their convolutional features. We develop
an algorithm for the nontrivial end-to-end training of this causal, cascaded
structure. Our solution is a clean, single-step training framework and can be
generalized to cascades that have more stages. We demonstrate state-of-the-art
instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our
method takes only 360ms testing an image using VGG-16, which is two orders of
magnitude faster than previous systems for this challenging problem. As a by
product, our method also achieves compelling object detection results which
surpass the competitive Fast/Faster R-CNN systems.
The method described in this paper is the foundation of our submissions to
the MS COCO 2015 segmentation competition, where we won the 1st place.Comment: Tech report. 1st-place winner of MS COCO 2015 segmentation
competitio
Optimal Multi-Object Segmentation with Novel Gradient Vector Flow Based Shape Priors
Shape priors have been widely utilized in medical image segmentation to
improve segmentation accuracy and robustness. A major way to encode such a
prior shape model is to use a mesh representation, which is prone to causing
self-intersection or mesh folding. Those problems require complex and expensive
algorithms to mitigate. In this paper, we propose a novel shape prior directly
embedded in the voxel grid space, based on gradient vector flows of a
pre-segmentation. The flexible and powerful prior shape representation is ready
to be extended to simultaneously segmenting multiple interacting objects with
minimum separation distance constraint. The problem is formulated as a Markov
random field problem whose exact solution can be efficiently computed with a
single minimum s-t cut in an appropriately constructed graph. The proposed
algorithm is validated on two multi-object segmentation applications: the brain
tissue segmentation in MRI images, and the bladder/prostate segmentation in CT
images. Both sets of experiments show superior or competitive performance of
the proposed method to other state-of-the-art methods.Comment: Paper in revie
Fast User-Guided Video Object Segmentation by Interaction-and-Propagation Networks
We present a deep learning method for the interactive video object
segmentation. Our method is built upon two core operations, interaction and
propagation, and each operation is conducted by Convolutional Neural Networks.
The two networks are connected both internally and externally so that the
networks are trained jointly and interact with each other to solve the complex
video object segmentation problem. We propose a new multi-round training scheme
for the interactive video object segmentation so that the networks can learn
how to understand the user's intention and update incorrect estimations during
the training. At the testing time, our method produces high-quality results and
also runs fast enough to work with users interactively. We evaluated the
proposed method quantitatively on the interactive track benchmark at the DAVIS
Challenge 2018. We outperformed other competing methods by a significant margin
in both the speed and the accuracy. We also demonstrated that our method works
well with real user interactions.Comment: CVPR 201
- …