1,090 research outputs found
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Few prior works study deep learning on point sets. PointNet by Qi et al. is a
pioneer in this direction. However, by design PointNet does not capture local
structures induced by the metric space points live in, limiting its ability to
recognize fine-grained patterns and generalizability to complex scenes. In this
work, we introduce a hierarchical neural network that applies PointNet
recursively on a nested partitioning of the input point set. By exploiting
metric space distances, our network is able to learn local features with
increasing contextual scales. With further observation that point sets are
usually sampled with varying densities, which results in greatly decreased
performance for networks trained on uniform densities, we propose novel set
learning layers to adaptively combine features from multiple scales.
Experiments show that our network called PointNet++ is able to learn deep point
set features efficiently and robustly. In particular, results significantly
better than state-of-the-art have been obtained on challenging benchmarks of 3D
point clouds
Mapping Auto-context Decision Forests to Deep ConvNets for Semantic Segmentation
We consider the task of pixel-wise semantic segmentation given a small set of
labeled training images. Among two of the most popular techniques to address
this task are Decision Forests (DF) and Neural Networks (NN). In this work, we
explore the relationship between two special forms of these techniques: stacked
DFs (namely Auto-context) and deep Convolutional Neural Networks (ConvNet). Our
main contribution is to show that Auto-context can be mapped to a deep ConvNet
with novel architecture, and thereby trained end-to-end. This mapping can be
used as an initialization of a deep ConvNet, enabling training even in the face
of very limited amounts of training data. We also demonstrate an approximate
mapping back from the refined ConvNet to a second stacked DF, with improved
performance over the original. We experimentally verify that these mappings
outperform stacked DFs for two different applications in computer vision and
biology: Kinect-based body part labeling from depth images, and somite
segmentation in microscopy images of developing zebrafish. Finally, we revisit
the core mapping from a Decision Tree (DT) to a NN, and show that it is also
possible to map a fuzzy DT, with sigmoidal split decisions, to a NN. This
addresses multiple limitations of the previous mapping, and yields new insights
into the popular Rectified Linear Unit (ReLU), and more recently proposed
concatenated ReLU (CReLU), activation functions
DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation
Accurate medical image segmentation is essential for diagnosis, surgical
planning and many other applications. Convolutional Neural Networks (CNNs) have
become the state-of-the-art automatic segmentation methods. However, fully
automatic results may still need to be refined to become accurate and robust
enough for clinical use. We propose a deep learning-based interactive
segmentation method to improve the results obtained by an automatic CNN and to
reduce user interactions during refinement for higher accuracy. We use one CNN
to obtain an initial automatic segmentation, on which user interactions are
added to indicate mis-segmentations. Another CNN takes as input the user
interactions with the initial segmentation and gives a refined result. We
propose to combine user interactions with CNNs through geodesic distance
transforms, and propose a resolution-preserving network that gives a better
dense prediction. In addition, we integrate user interactions as hard
constraints into a back-propagatable Conditional Random Field. We validated the
proposed framework in the context of 2D placenta segmentation from fetal MRI
and 3D brain tumor segmentation from FLAIR images. Experimental results show
our method achieves a large improvement from automatic CNNs, and obtains
comparable and even higher accuracy with fewer user interventions and less time
compared with traditional interactive methods.Comment: 14 pages, 15 figure
Multi-Kernel Diffusion CNNs for Graph-Based Learning on Point Clouds
Graph convolutional networks are a new promising learning approach to deal
with data on irregular domains. They are predestined to overcome certain
limitations of conventional grid-based architectures and will enable efficient
handling of point clouds or related graphical data representations, e.g.
superpixel graphs. Learning feature extractors and classifiers on 3D point
clouds is still an underdeveloped area and has potential restrictions to equal
graph topologies. In this work, we derive a new architectural design that
combines rotationally and topologically invariant graph diffusion operators and
node-wise feature learning through 1x1 convolutions. By combining multiple
isotropic diffusion operations based on the Laplace-Beltrami operator, we can
learn an optimal linear combination of diffusion kernels for effective feature
propagation across nodes on an irregular graph. We validated our approach for
learning point descriptors as well as semantic classification on real 3D point
clouds of human poses and demonstrate an improvement from 85% to 95% in Dice
overlap with our multi-kernel approach.Comment: accepted for ECCV 2018 Workshop Geometry Meets Deep Learnin
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
Recent leading approaches to semantic segmentation rely on deep convolutional
networks trained with human-annotated, pixel-level segmentation masks. Such
pixel-accurate supervision demands expensive labeling effort and limits the
performance of deep networks that usually benefit from more training data. In
this paper, we propose a method that achieves competitive accuracy but only
requires easily obtained bounding box annotations. The basic idea is to iterate
between automatically generating region proposals and training convolutional
networks. These two steps gradually recover segmentation masks for improving
the networks, and vise versa. Our method, called BoxSup, produces competitive
results supervised by boxes only, on par with strong baselines fully supervised
by masks under the same setting. By leveraging a large amount of bounding
boxes, BoxSup further unleashes the power of deep convolutional networks and
yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT
SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation
State-of-the-art scene flow algorithms pursue the conflicting targets of
accuracy, run time, and robustness. With the successful concept of pixel-wise
matching and sparse-to-dense interpolation, we push the limits of scene flow
estimation. Avoiding strong assumptions on the domain or the problem yields a
more robust algorithm. This algorithm is fast because we avoid explicit
regularization during matching, which allows an efficient computation. Using
image information from multiple time steps and explicit visibility prediction
based on previous results, we achieve competitive performances on different
data sets. Our contributions and results are evaluated in comparative
experiments. Overall, we present an accurate scene flow algorithm that is
faster and more generic than any individual benchmark leader.Comment: arXiv admin note: text overlap with arXiv:1710.1009
How deep learning works --The geometry of deep learning
Why and how that deep learning works well on different tasks remains a
mystery from a theoretical perspective. In this paper we draw a geometric
picture of the deep learning system by finding its analogies with two existing
geometric structures, the geometry of quantum computations and the geometry of
the diffeomorphic template matching. In this framework, we give the geometric
structures of different deep learning systems including convolutional neural
networks, residual networks, recursive neural networks, recurrent neural
networks and the equilibrium prapagation framework. We can also analysis the
relationship between the geometrical structures and their performance of
different networks in an algorithmic level so that the geometric framework may
guide the design of the structures and algorithms of deep learning systems.Comment: 16 pages, 13 figure
Spatial Aggregation of Holistically-Nested Networks for Automated Pancreas Segmentation
Accurate automatic organ segmentation is an important yet challenging problem
for medical image analysis. The pancreas is an abdominal organ with very high
anatomical variability. This inhibits traditional segmentation methods from
achieving high accuracies, especially compared to other organs such as the
liver, heart or kidneys. In this paper, we present a holistic learning approach
that integrates semantic mid-level cues of deeply-learned organ interior and
boundary maps via robust spatial aggregation using random forest. Our method
generates boundary preserving pixel-wise class labels for pancreas
segmentation. Quantitative evaluation is performed on CT scans of 82 patients
in 4-fold cross-validation. We achieve a (mean std. dev.) Dice Similarity
Coefficient of 78.01% 8.2% in testing which significantly outperforms the
previous state-of-the-art approach of 71.8% 10.7% under the same
evaluation criterion.Comment: This article will be presented at MICCAI (Medical Image Computing and
Computer-Assisted Interventions), Athens, Greece, 201
A Bottom-up Approach for Pancreas Segmentation using Cascaded Superpixels and (Deep) Image Patch Labeling
Robust automated organ segmentation is a prerequisite for computer-aided
diagnosis (CAD), quantitative imaging analysis and surgical assistance. For
high-variability organs such as the pancreas, previous approaches report
undesirably low accuracies. We present a bottom-up approach for pancreas
segmentation in abdominal CT scans that is based on a hierarchy of information
propagation by classifying image patches at different resolutions; and
cascading superpixels. There are four stages: 1) decomposing CT slice images as
a set of disjoint boundary-preserving superpixels; 2) computing pancreas class
probability maps via dense patch labeling; 3) classifying superpixels by
pooling both intensity and probability features to form empirical statistics in
cascaded random forest frameworks; and 4) simple connectivity based
post-processing. The dense image patch labeling are conducted by: efficient
random forest classifier on image histogram, location and texture features; and
more expensive (but with better specificity) deep convolutional neural network
classification on larger image windows (with more spatial contexts). Evaluation
of the approach is performed on a database of 80 manually segmented CT volumes
in six-fold cross-validation (CV). Our achieved results are comparable, or
better than the state-of-the-art methods (evaluated by
"leave-one-patient-out"), with Dice 70.7% and Jaccard 57.9%. The computational
efficiency has been drastically improved in the order of 6~8 minutes, comparing
with others of ~10 hours per case. Finally, we implement a multi-atlas label
fusion (MALF) approach for pancreas segmentation using the same datasets. Under
six-fold CV, our bottom-up segmentation method significantly outperforms its
MALF counterpart: (70.7 +/- 13.0%) versus (52.5 +/- 20.8%) in Dice. Deep CNN
patch labeling confidences offer more numerical stability, reflected by smaller
standard deviations.Comment: 14 pages, 14 figures, 2 table
- …