45,642 research outputs found
Doctor of Philosophy
dissertationScene labeling is the problem of assigning an object label to each pixel of a given image. It is the primary step towards image understanding and unifies object recognition and image segmentation in a single framework. A perfect scene labeling framework detects and densely labels every region and every object that exists in an image. This task is of substantial importance in a wide range of applications in computer vision. Contextual information plays an important role in scene labeling frameworks. A contextual model utilizes the relationships among the objects in a scene to facilitate object detection and image segmentation. Using contextual information in an effective way is one of the main questions that should be answered in any scene labeling framework. In this dissertation, we develop two scene labeling frameworks that rely heavily on contextual information to improve the performance over state-of-the-art methods. The first model, called the multiclass multiscale contextual model (MCMS), uses contextual information from multiple objects and at different scales for learning discriminative models in a supervised setting. The MCMS model incorporates crossobject and interobject information into one probabilistic framework, and thus is able to capture geometrical relationships and dependencies among multiple objects in addition to local information from each single object present in an image. The second model, called the contextual hierarchical model (CHM), learns contextual information in a hierarchy for scene labeling. At each level of the hierarchy, a classifier is trained based on downsampled input images and outputs of previous levels. The CHM then incorporates the resulting multiresolution contextual information into a classifier to segment the input image at original resolution. This training strategy allows for optimization of a joint posterior probability at multiple resolutions through the hierarchy. We demonstrate the performance of CHM on different challenging tasks such as outdoor scene labeling and edge detection in natural images and membrane detection in electron microscopy images. We also introduce two novel classification methods. WNS-AdaBoost speeds up the training of AdaBoost by providing a compact representation of a training set. Disjunctive normal random forest (DNRF) is an ensemble method that is able to learn complex decision boundaries and achieves low generalization error by optimizing a single objective function for each weak classifier in the ensemble. Finally, a segmentation framework is introduced that exploits both shape information and regional statistics to segment irregularly shaped intracellular structures such as mitochondria in electron microscopy images
Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers
Scene parsing, or semantic segmentation, consists in labeling each pixel in
an image with the category of the object it belongs to. It is a challenging
task that involves the simultaneous detection, segmentation and recognition of
all the objects in the image.
The scene parsing method proposed here starts by computing a tree of segments
from a graph of pixel dissimilarities. Simultaneously, a set of dense feature
vectors is computed which encodes regions of multiple sizes centered on each
pixel. The feature extractor is a multiscale convolutional network trained from
raw pixels. The feature vectors associated with the segments covered by each
node in the tree are aggregated and fed to a classifier which produces an
estimate of the distribution of object categories contained in the segment. A
subset of tree nodes that cover the image are then selected so as to maximize
the average "purity" of the class distributions, hence maximizing the overall
likelihood that each segment will contain a single object. The convolutional
network feature extractor is trained end-to-end from raw pixels, alleviating
the need for engineered features. After training, the system is parameter free.
The system yields record accuracies on the Stanford Background Dataset (8
classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170
classes) while being an order of magnitude faster than competing approaches,
producing a 320 \times 240 image labeling in less than 1 second.Comment: 9 pages, 4 figures - Published in 29th International Conference on
Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdo
Probabilistic Modeling of Inter- and Intra-observer Variability in Medical Image Segmentation
Medical image segmentation is a challenging task, particularly due to inter-
and intra-observer variability, even between medical experts. In this paper, we
propose a novel model, called Probabilistic Inter-Observer and iNtra-Observer
variation NetwOrk (Pionono). It captures the labeling behavior of each rater
with a multidimensional probability distribution and integrates this
information with the feature maps of the image to produce probabilistic
segmentation predictions. The model is optimized by variational inference and
can be trained end-to-end. It outperforms state-of-the-art models such as
STAPLE, Probabilistic U-Net, and models based on confusion matrices.
Additionally, Pionono predicts multiple coherent segmentation maps that mimic
the rater's expert opinion, which provides additional valuable information for
the diagnostic process. Experiments on real-world cancer segmentation datasets
demonstrate the high accuracy and efficiency of Pionono, making it a powerful
tool for medical image analysis.Comment: 13 pages, 5 figure
Active Learning for Semantic Segmentation with Multi-class Label Query
This paper proposes a new active learning method for semantic segmentation.
The core of our method lies in a new annotation query design. It samples
informative local image regions (e.g., superpixels), and for each of such
regions, asks an oracle for a multi-hot vector indicating all classes existing
in the region. This multi-class labeling strategy is substantially more
efficient than existing ones like segmentation, polygon, and even dominant
class labeling in terms of annotation time per click. However, it introduces
the class ambiguity issue in training since it assigns partial labels (i.e., a
set of candidate classes) to individual pixels. We thus propose a new algorithm
for learning semantic segmentation while disambiguating the partial labels in
two stages. In the first stage, it trains a segmentation model directly with
the partial labels through two new loss functions motivated by partial label
learning and multiple instance learning. In the second stage, it disambiguates
the partial labels by generating pixel-wise pseudo labels, which are used for
supervised learning of the model. Equipped with a new acquisition function
dedicated to the multi-class labeling, our method outperformed previous work on
Cityscapes and PASCAL VOC 2012 while spending less annotation cost
Deep GrabCut for Object Selection
Most previous bounding-box-based segmentation methods assume the bounding box
tightly covers the object of interest. However it is common that a rectangle
input could be too large or too small. In this paper, we propose a novel
segmentation approach that uses a rectangle as a soft constraint by
transforming it into an Euclidean distance map. A convolutional encoder-decoder
network is trained end-to-end by concatenating images with these distance maps
as inputs and predicting the object masks as outputs. Our approach gets
accurate segmentation results given sloppy rectangles while being general for
both interactive segmentation and instance segmentation. We show our network
extends to curve-based input without retraining. We further apply our network
to instance-level semantic segmentation and resolve any overlap using a
conditional random field. Experiments on benchmark datasets demonstrate the
effectiveness of the proposed approaches.Comment: BMVC 201
Visual Chunking: A List Prediction Framework for Region-Based Object Detection
We consider detecting objects in an image by iteratively selecting from a set
of arbitrarily shaped candidate regions. Our generic approach, which we term
visual chunking, reasons about the locations of multiple object instances in an
image while expressively describing object boundaries. We design an
optimization criterion for measuring the performance of a list of such
detections as a natural extension to a common per-instance metric. We present
an efficient algorithm with provable performance for building a high-quality
list of detections from any candidate set of region-based proposals. We also
develop a simple class-specific algorithm to generate a candidate region
instance in near-linear time in the number of low-level superpixels that
outperforms other region generating methods. In order to make predictions on
novel images at testing time without access to ground truth, we develop
learning approaches to emulate these algorithms' behaviors. We demonstrate that
our new approach outperforms sophisticated baselines on benchmark datasets.Comment: to appear at ICRA 201
- âŠ