Search CORE

1,090 research outputs found

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

Author: Cai Jianfei
Lu Shijian
Meng Fanman
Zhu Hongyuan
Publication venue
Publication date: 02/02/2015
Field of study

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image Representatio

arXiv.org e-Print Archive

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Author: Guibas Leonidas J.
Qi Charles R.
Su Hao
Yi Li
Publication venue
Publication date: 07/06/2017
Field of study

Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds

arXiv.org e-Print Archive

Mapping Auto-context Decision Forests to Deep ConvNets for Semantic Segmentation

Author: Kainmueller Dagmar
Myers Eugene W.
Richmond David L.
Rother Carsten
Yang Michael Y.
Publication venue
Publication date: 13/08/2018
Field of study

We consider the task of pixel-wise semantic segmentation given a small set of labeled training images. Among two of the most popular techniques to address this task are Decision Forests (DF) and Neural Networks (NN). In this work, we explore the relationship between two special forms of these techniques: stacked DFs (namely Auto-context) and deep Convolutional Neural Networks (ConvNet). Our main contribution is to show that Auto-context can be mapped to a deep ConvNet with novel architecture, and thereby trained end-to-end. This mapping can be used as an initialization of a deep ConvNet, enabling training even in the face of very limited amounts of training data. We also demonstrate an approximate mapping back from the refined ConvNet to a second stacked DF, with improved performance over the original. We experimentally verify that these mappings outperform stacked DFs for two different applications in computer vision and biology: Kinect-based body part labeling from depth images, and somite segmentation in microscopy images of developing zebrafish. Finally, we revisit the core mapping from a Decision Tree (DT) to a NN, and show that it is also possible to map a fuzzy DT, with sigmoidal split decisions, to a NN. This addresses multiple limitations of the previous mapping, and yields new insights into the popular Rectified Linear Unit (ReLU), and more recently proposed concatenated ReLU (CReLU), activation functions

arXiv.org e-Print Archive

DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation

Author: Aertsen Michael
David Anna L.
Deprest Jan
Doel Tom
Li Wenqi
Ourselin Sebastien
Patel Premal A.
Pratt Rosalind
Vercauteren Tom
Wang Guotai
Zuluaga Maria A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/09/2017
Field of study

Accurate medical image segmentation is essential for diagnosis, surgical planning and many other applications. Convolutional Neural Networks (CNNs) have become the state-of-the-art automatic segmentation methods. However, fully automatic results may still need to be refined to become accurate and robust enough for clinical use. We propose a deep learning-based interactive segmentation method to improve the results obtained by an automatic CNN and to reduce user interactions during refinement for higher accuracy. We use one CNN to obtain an initial automatic segmentation, on which user interactions are added to indicate mis-segmentations. Another CNN takes as input the user interactions with the initial segmentation and gives a refined result. We propose to combine user interactions with CNNs through geodesic distance transforms, and propose a resolution-preserving network that gives a better dense prediction. In addition, we integrate user interactions as hard constraints into a back-propagatable Conditional Random Field. We validated the proposed framework in the context of 2D placenta segmentation from fetal MRI and 3D brain tumor segmentation from FLAIR images. Experimental results show our method achieves a large improvement from automatic CNNs, and obtains comparable and even higher accuracy with fewer user interventions and less time compared with traditional interactive methods.Comment: 14 pages, 15 figure

arXiv.org e-Print Archive

Multi-Kernel Diffusion CNNs for Graph-Based Learning on Point Clouds

Author: Diesel Jasper
Hansen Lasse
Heinrich Mattias P.
Publication venue
Publication date: 14/09/2018
Field of study

Graph convolutional networks are a new promising learning approach to deal with data on irregular domains. They are predestined to overcome certain limitations of conventional grid-based architectures and will enable efficient handling of point clouds or related graphical data representations, e.g. superpixel graphs. Learning feature extractors and classifiers on 3D point clouds is still an underdeveloped area and has potential restrictions to equal graph topologies. In this work, we derive a new architectural design that combines rotationally and topologically invariant graph diffusion operators and node-wise feature learning through 1x1 convolutions. By combining multiple isotropic diffusion operations based on the Laplace-Beltrami operator, we can learn an optimal linear combination of diffusion kernels for effective feature propagation across nodes on an irregular graph. We validated our approach for learning point descriptors as well as semantic classification on real 3D point clouds of human poses and demonstrate an improvement from 85% to 95% in Dice overlap with our multi-kernel approach.Comment: accepted for ECCV 2018 Workshop Geometry Meets Deep Learnin

arXiv.org e-Print Archive

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

Author: Dai Jifeng
He Kaiming
Sun Jian
Publication venue
Publication date: 18/05/2015
Field of study

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called BoxSup, produces competitive results supervised by boxes only, on par with strong baselines fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further unleashes the power of deep convolutional networks and yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT

arXiv.org e-Print Archive

SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation

Author: Kuschk Georg
Schuster René
Stricker Didier
Unger Christian
Wasenmüller Oliver
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2019
Field of study

State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we push the limits of scene flow estimation. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during matching, which allows an efficient computation. Using image information from multiple time steps and explicit visibility prediction based on previous results, we achieve competitive performances on different data sets. Our contributions and results are evaluated in comparative experiments. Overall, we present an accurate scene flow algorithm that is faster and more generic than any individual benchmark leader.Comment: arXiv admin note: text overlap with arXiv:1710.1009

arXiv.org e-Print Archive

How deep learning works --The geometry of deep learning

Author: Dong Xiao
Wu Jiasong
Zhou Ling
Publication venue
Publication date: 30/10/2017
Field of study

Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective. In this paper we draw a geometric picture of the deep learning system by finding its analogies with two existing geometric structures, the geometry of quantum computations and the geometry of the diffeomorphic template matching. In this framework, we give the geometric structures of different deep learning systems including convolutional neural networks, residual networks, recursive neural networks, recurrent neural networks and the equilibrium prapagation framework. We can also analysis the relationship between the geometrical structures and their performance of different networks in an algorithmic level so that the geometric framework may guide the design of the structures and algorithms of deep learning systems.Comment: 16 pages, 13 figure

arXiv.org e-Print Archive

Spatial Aggregation of Holistically-Nested Networks for Automated Pancreas Segmentation

Author: Farag Amal
Lu Le
Roth Holger R.
Sohn Andrew
Summers Ronald M.
Publication venue
Publication date: 24/06/2016
Field of study

Accurate automatic organ segmentation is an important yet challenging problem for medical image analysis. The pancreas is an abdominal organ with very high anatomical variability. This inhibits traditional segmentation methods from achieving high accuracies, especially compared to other organs such as the liver, heart or kidneys. In this paper, we present a holistic learning approach that integrates semantic mid-level cues of deeply-learned organ interior and boundary maps via robust spatial aggregation using random forest. Our method generates boundary preserving pixel-wise class labels for pancreas segmentation. Quantitative evaluation is performed on CT scans of 82 patients in 4-fold cross-validation. We achieve a (mean

\pm

std. dev.) Dice Similarity Coefficient of 78.01%

\pm

8.2% in testing which significantly outperforms the previous state-of-the-art approach of 71.8%

\pm

10.7% under the same evaluation criterion.Comment: This article will be presented at MICCAI (Medical Image Computing and Computer-Assisted Interventions), Athens, Greece, 201

arXiv.org e-Print Archive

A Bottom-up Approach for Pancreas Segmentation using Cascaded Superpixels and (Deep) Image Patch Labeling

Author: Farag Amal
Liu Jiamin
Lu Le
Roth Holger R.
Summers Ronald M.
Turkbey Evrim
Publication venue
Publication date: 07/03/2016
Field of study

Robust automated organ segmentation is a prerequisite for computer-aided diagnosis (CAD), quantitative imaging analysis and surgical assistance. For high-variability organs such as the pancreas, previous approaches report undesirably low accuracies. We present a bottom-up approach for pancreas segmentation in abdominal CT scans that is based on a hierarchy of information propagation by classifying image patches at different resolutions; and cascading superpixels. There are four stages: 1) decomposing CT slice images as a set of disjoint boundary-preserving superpixels; 2) computing pancreas class probability maps via dense patch labeling; 3) classifying superpixels by pooling both intensity and probability features to form empirical statistics in cascaded random forest frameworks; and 4) simple connectivity based post-processing. The dense image patch labeling are conducted by: efficient random forest classifier on image histogram, location and texture features; and more expensive (but with better specificity) deep convolutional neural network classification on larger image windows (with more spatial contexts). Evaluation of the approach is performed on a database of 80 manually segmented CT volumes in six-fold cross-validation (CV). Our achieved results are comparable, or better than the state-of-the-art methods (evaluated by "leave-one-patient-out"), with Dice 70.7% and Jaccard 57.9%. The computational efficiency has been drastically improved in the order of 6~8 minutes, comparing with others of ~10 hours per case. Finally, we implement a multi-atlas label fusion (MALF) approach for pancreas segmentation using the same datasets. Under six-fold CV, our bottom-up segmentation method significantly outperforms its MALF counterpart: (70.7 +/- 13.0%) versus (52.5 +/- 20.8%) in Dice. Deep CNN patch labeling confidences offer more numerical stability, reflected by smaller standard deviations.Comment: 14 pages, 14 figures, 2 table

arXiv.org e-Print Archive