9,095 research outputs found
DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks
In this paper, we propose DeepCut, a method to obtain pixelwise object
segmentations given an image dataset labelled with bounding box annotations. It
extends the approach of the well-known GrabCut method to include machine
learning by training a neural network classifier from bounding box annotations.
We formulate the problem as an energy minimisation problem over a
densely-connected conditional random field and iteratively update the training
targets to obtain pixelwise object segmentations. Additionally, we propose
variants of the DeepCut method and compare those to a naive approach to CNN
training under weak supervision. We test its applicability to solve brain and
lung segmentation problems on a challenging fetal magnetic resonance dataset
and obtain encouraging results in terms of accuracy
Deep GrabCut for Object Selection
Most previous bounding-box-based segmentation methods assume the bounding box
tightly covers the object of interest. However it is common that a rectangle
input could be too large or too small. In this paper, we propose a novel
segmentation approach that uses a rectangle as a soft constraint by
transforming it into an Euclidean distance map. A convolutional encoder-decoder
network is trained end-to-end by concatenating images with these distance maps
as inputs and predicting the object masks as outputs. Our approach gets
accurate segmentation results given sloppy rectangles while being general for
both interactive segmentation and instance segmentation. We show our network
extends to curve-based input without retraining. We further apply our network
to instance-level semantic segmentation and resolve any overlap using a
conditional random field. Experiments on benchmark datasets demonstrate the
effectiveness of the proposed approaches.Comment: BMVC 201
ImageSpirit: Verbal Guided Image Parsing
Humans describe images in terms of nouns and adjectives while algorithms
operate on images represented as sets of pixels. Bridging this gap between how
humans would like to access images versus their typical representation is the
goal of image parsing, which involves assigning object and attribute labels to
pixel. In this paper we propose treating nouns as object labels and adjectives
as visual attribute labels. This allows us to formulate the image parsing
problem as one of jointly estimating per-pixel object and attribute labels from
a set of training images. We propose an efficient (interactive time) solution.
Using the extracted labels as handles, our system empowers a user to verbally
refine the results. This enables hands-free parsing of an image into pixel-wise
object/attribute labels that correspond to human semantics. Verbally selecting
objects of interests enables a novel and natural interaction modality that can
possibly be used to interact with new generation devices (e.g. smart phones,
Google Glass, living room devices). We demonstrate our system on a large number
of real-world images with varying complexity. To help understand the tradeoffs
compared to traditional mouse based interactions, results are reported for both
a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Are we using the right potential functions in the Conditional Random Field
models that are popular in the Vision community? Semantic segmentation and
other pixel-level labelling tasks have made significant progress recently due
to the deep learning paradigm. However, most state-of-the-art structured
prediction methods also include a random field model with a hand-crafted
Gaussian potential to model spatial priors, label consistencies and
feature-based image conditioning.
In this paper, we challenge this view by developing a new inference and
learning framework which can learn pairwise CRF potentials restricted only by
their dependence on the image pixel values and the size of the support. Both
standard spatial and high-dimensional bilateral kernels are considered. Our
framework is based on the observation that CRF inference can be achieved via
projected gradient descent and consequently, can easily be integrated in deep
neural networks to allow for end-to-end training. It is empirically
demonstrated that such learned potentials can improve segmentation accuracy and
that certain label class interactions are indeed better modelled by a
non-Gaussian potential. In addition, we compare our inference method to the
commonly used mean-field algorithm. Our framework is evaluated on several
public benchmarks for semantic segmentation with improved performance compared
to previous state-of-the-art CNN+CRF models.Comment: Presented at EMMCVPR 2017 conferenc
- …