267 research outputs found
Interactive Binary Image Segmentation with Edge Preservation
Binary image segmentation plays an important role in computer vision and has
been widely used in many applications such as image and video editing, object
extraction, and photo composition. In this paper, we propose a novel
interactive binary image segmentation method based on the Markov Random Field
(MRF) framework and the fast bilateral solver (FBS) technique. Specifically, we
employ the geodesic distance component to build the unary term. To ensure both
computation efficiency and effective responsiveness for interactive
segmentation, superpixels are used in computing geodesic distances instead of
pixels. Furthermore, we take a bilateral affinity approach for the pairwise
term in order to preserve edge information and denoise. Through the alternating
direction strategy, the MRF energy minimization problem is divided into two
subproblems, which then can be easily solved by steepest gradient descent (SGD)
and FBS respectively. Experimental results on the VGG interactive image
segmentation dataset show that the proposed algorithm outperforms several
state-of-the-art ones, and in particular, it can achieve satisfactory
edge-smooth segmentation results even when the foreground and background color
appearances are quite indistinctive
Deep Interactive Object Selection
Interactive object selection is a very important research problem and has
many applications. Previous algorithms require substantial user interactions to
estimate the foreground and background distributions. In this paper, we present
a novel deep learning based algorithm which has a much better understanding of
objectness and thus can reduce user interactions to just a few clicks. Our
algorithm transforms user provided positive and negative clicks into two
Euclidean distance maps which are then concatenated with the RGB channels of
images to compose (image, user interactions) pairs. We generate many of such
pairs by combining several random sampling strategies to model user click
patterns and use them to fine tune deep Fully Convolutional Networks (FCNs).
Finally the output probability maps of our FCN 8s model is integrated with
graph cut optimization to refine the boundary segments. Our model is trained on
the PASCAL segmentation dataset and evaluated on other datasets with different
object classes. Experimental results on both seen and unseen objects clearly
demonstrate that our algorithm has a good generalization ability and is
superior to all existing interactive object selection approaches.Comment: Computer Vision and Pattern Recognitio
Neutro-Connectedness Cut
Interactive image segmentation is a challenging task and receives increasing
attention recently; however, two major drawbacks exist in interactive
segmentation approaches. First, the segmentation performance of ROI-based
methods is sensitive to the initial ROI: different ROIs may produce results
with great difference. Second, most seed-based methods need intense
interactions, and are not applicable in many cases. In this work, we generalize
the Neutro-Connectedness (NC) to be independent of top-down priors of objects
and to model image topology with indeterminacy measurement on image regions,
propose a novel method for determining object and background regions, which is
applied to exclude isolated background regions and enforce label consistency,
and put forward a hybrid interactive segmentation method, Neutro-Connectedness
Cut (NC-Cut), which can overcome the above two problems by utilizing both
pixel-wise appearance information and region-based NC properties. We evaluate
the proposed NC-Cut by employing two image datasets (265 images), and
demonstrate that the proposed approach outperforms state-of-the-art interactive
image segmentation methods (Grabcut, MILCut, One-Cut, MGC_max^sum and pPBC).Comment: 15 pages, 14 figures, 4 tables, journa
Dominant Sets for "Constrained" Image Segmentation
Image segmentation has come a long way since the early days of computer
vision, and still remains a challenging task. Modern variations of the
classical (purely bottom-up) approach, involve, e.g., some form of user
assistance (interactive segmentation) or ask for the simultaneous segmentation
of two or more images (co-segmentation). At an abstract level, all these
variants can be thought of as "constrained" versions of the original
formulation, whereby the segmentation process is guided by some external source
of information. In this paper, we propose a new approach to tackle this kind of
problems in a unified way. Our work is based on some properties of a family of
quadratic optimization problems related to dominant sets, a well-known
graph-theoretic notion of a cluster which generalizes the concept of a maximal
clique to edge-weighted graphs. In particular, we show that by properly
controlling a regularization parameter which determines the structure and the
scale of the underlying problem, we are in a position to extract groups of
dominant-set clusters that are constrained to contain predefined elements. In
particular, we shall focus on interactive segmentation and co-segmentation (in
both the unsupervised and the interactive versions). The proposed algorithm can
deal naturally with several type of constraints and input modality, including
scribbles, sloppy contours, and bounding boxes, and is able to robustly handle
noisy annotations on the part of the user. Experiments on standard benchmark
datasets show the effectiveness of our approach as compared to state-of-the-art
algorithms on a variety of natural images under several input conditions and
constraints.Comment: arXiv admin note: text overlap with arXiv:1608.0064
DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation
Accurate medical image segmentation is essential for diagnosis, surgical
planning and many other applications. Convolutional Neural Networks (CNNs) have
become the state-of-the-art automatic segmentation methods. However, fully
automatic results may still need to be refined to become accurate and robust
enough for clinical use. We propose a deep learning-based interactive
segmentation method to improve the results obtained by an automatic CNN and to
reduce user interactions during refinement for higher accuracy. We use one CNN
to obtain an initial automatic segmentation, on which user interactions are
added to indicate mis-segmentations. Another CNN takes as input the user
interactions with the initial segmentation and gives a refined result. We
propose to combine user interactions with CNNs through geodesic distance
transforms, and propose a resolution-preserving network that gives a better
dense prediction. In addition, we integrate user interactions as hard
constraints into a back-propagatable Conditional Random Field. We validated the
proposed framework in the context of 2D placenta segmentation from fetal MRI
and 3D brain tumor segmentation from FLAIR images. Experimental results show
our method achieves a large improvement from automatic CNNs, and obtains
comparable and even higher accuracy with fewer user interventions and less time
compared with traditional interactive methods.Comment: 14 pages, 15 figure
Selective Video Object Cutout
Conventional video segmentation approaches rely heavily on appearance models.
Such methods often use appearance descriptors that have limited discriminative
power under complex scenarios. To improve the segmentation performance, this
paper presents a pyramid histogram based confidence map that incorporates
structure information into appearance statistics. It also combines geodesic
distance based dynamic models. Then, it employs an efficient measure of
uncertainty propagation using local classifiers to determine the image regions
where the object labels might be ambiguous. The final foreground cutout is
obtained by refining on the uncertain regions. Additionally, to reduce manual
labeling, our method determines the frames to be labeled by the human operator
in a principled manner, which further boosts the segmentation performance and
minimizes the labeling effort. Our extensive experimental analyses on two big
benchmarks demonstrate that our solution achieves superior performance,
favorable computational efficiency, and reduced manual labeling in comparison
to the state-of-the-art.Comment: W. Wang, J. Shen, and F. Porikli. "Selective video object cutout."
IEEE Transactions on Image Processing 26.12 (2017): 5645-565
An interactive image segmentation method in hand gesture recognition
In order to improve the recognition rate of hand gestures a new interactive image segmentation method for hand gesture recognition is presented, and popular methods, e.g., Graph cut, Random walker, Interactive image segmentation using geodesic star convexity, are studied in this article. The Gaussian Mixture Model was employed for image modelling and the iteration of Expectation Maximum algorithm learns the parameters of Gaussian Mixture Model. We apply a Gibbs random field to the image segmentation and minimize the Gibbs Energy using Min-cut theorem to find the optimal segmentation. The segmentation result of our method is tested on an image dataset and compared with other methods by estimating the region accuracy and boundary accuracy. Finally five kinds of hand gestures in different backgrounds are tested on our experimental platform, and the sparse representation algorithm is used, proving that the segmentation of hand gesture images helps to improve the recognition accuracy
Habitat: A Platform for Embodied AI Research
We present Habitat, a platform for research in embodied artificial
intelligence (AI). Habitat enables training embodied agents (virtual robots) in
highly efficient photorealistic 3D simulation. Specifically, Habitat consists
of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with
configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is
fast -- when rendering a scene from Matterport3D, it achieves several thousand
frames per second (fps) running single-threaded, and can reach over 10,000 fps
multi-process on a single GPU. (ii) Habitat-API: a modular high-level library
for end-to-end development of embodied AI algorithms -- defining tasks (e.g.,
navigation, instruction following, question answering), configuring, training,
and benchmarking embodied agents.
These large-scale engineering contributions enable us to answer scientific
questions requiring experiments that were till now impracticable or 'merely'
impractical. Specifically, in the context of point-goal navigation: (1) we
revisit the comparison between learning and SLAM approaches from two recent
works and find evidence for the opposite conclusion -- that learning
outperforms SLAM if scaled to an order of magnitude more experience than
previous investigations, and (2) we conduct the first cross-dataset
generalization experiments {train, test} x {Matterport3D, Gibson} for multiple
sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors
generalize across datasets. We hope that our open-source platform and these
findings will advance research in embodied AI.Comment: ICCV 201
f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation
Deep neural networks have become a mainstream approach to interactive
segmentation. As we show in our experiments, while for some images a trained
network provides accurate segmentation result with just a few clicks, for some
unknown objects it cannot achieve satisfactory result even with a large amount
of user input. Recently proposed backpropagating refinement (BRS) scheme
introduces an optimization problem for interactive segmentation that results in
significantly better performance for the hard cases. At the same time, BRS
requires running forward and backward pass through a deep network several times
that leads to significantly increased computational budget per click compared
to other methods. We propose f-BRS (feature backpropagating refinement scheme)
that solves an optimization problem with respect to auxiliary variables instead
of the network inputs, and requires running forward and backward pass just for
a small part of a network. Experiments on GrabCut, Berkeley, DAVIS and SBD
datasets set new state-of-the-art at an order of magnitude lower time per click
compared to original BRS. The code and trained models are available at
https://github.com/saic-vul/fbrs_interactive_segmentation
AMAT: Medial Axis Transform for Natural Images
We introduce Appearance-MAT (AMAT), a generalization of the medial axis
transform for natural images, that is framed as a weighted geometric set cover
problem. We make the following contributions: i) we extend previous medial
point detection methods for color images, by associating each medial point with
a local scale; ii) inspired by the invertibility property of the binary MAT, we
also associate each medial point with a local encoding that allows us to invert
the AMAT, reconstructing the input image; iii) we describe a clustering scheme
that takes advantage of the additional scale and appearance information to
group individual points into medial branches, providing a shape decomposition
of the underlying image regions. In our experiments, we show state-of-the-art
performance in medial point detection on Berkeley Medial AXes (BMAX500), a new
dataset of medial axes based on the BSDS500 database, and good generalization
on the SK506 and WH-SYMMAX datasets. We also measure the quality of
reconstructed images from BMAX500, obtained by inverting their computed AMAT.
Our approach delivers significantly better reconstruction quality with respect
to three baselines, using just 10% of the image pixels. Our code and
annotations are available at https://github.com/tsogkas/amat .Comment: 10 pages (including references), 5 figures, accepted at ICCV 201
- …