16 research outputs found
Interactive Full Image Segmentation by Considering All Regions Jointly
We address interactive full image annotation, where the goal is to accurately
segment all object and stuff regions in an image. We propose an interactive,
scribble-based annotation framework which operates on the whole image to
produce segmentations for all regions. This enables sharing scribble
corrections across regions, and allows the annotator to focus on the largest
errors made by the machine across the whole image. To realize this, we adapt
Mask-RCNN into a fast interactive segmentation framework and introduce an
instance-aware loss measured at the pixel-level in the full image canvas, which
lets predictions for nearby regions properly compete for space. Finally, we
compare to interactive single object segmentation on the COCO panoptic dataset.
We demonstrate that our interactive full image segmentation approach leads to a
5% IoU gain, reaching 90% IoU at a budget of four extreme clicks and four
corrective scribbles per region.Comment: Accepted to CVPR 201
Interactive segmentation of medical images through fully convolutional neural networks
Image segmentation plays an essential role in medicine for both diagnostic
and interventional tasks. Segmentation approaches are either manual,
semi-automated or fully-automated. Manual segmentation offers full control over
the quality of the results, but is tedious, time consuming and prone to
operator bias. Fully automated methods require no human effort, but often
deliver sub-optimal results without providing users with the means to make
corrections. Semi-automated approaches keep users in control of the results by
providing means for interaction, but the main challenge is to offer a good
trade-off between precision and required interaction. In this paper we present
a deep learning (DL) based semi-automated segmentation approach that aims to be
a "smart" interactive tool for region of interest delineation in medical
images. We demonstrate its use for segmenting multiple organs on computed
tomography (CT) of the abdomen. Our approach solves some of the most pressing
clinical challenges: (i) it requires only one to a few user clicks to deliver
excellent 2D segmentations in a fast and reliable fashion; (ii) it can
generalize to previously unseen structures and "corner cases"; (iii) it
delivers results that can be corrected quickly in a smart and intuitive way up
to an arbitrary degree of precision chosen by the user and (iv) ensures high
accuracy. We present our approach and compare it to other techniques and
previous work to show the advantages brought by our method
NuClick: A Deep Learning Framework for Interactive Segmentation of Microscopy Images
Object segmentation is an important step in the workflow of computational
pathology. Deep learning based models generally require large amount of labeled
data for precise and reliable prediction. However, collecting labeled data is
expensive because it often requires expert knowledge, particularly in medical
imaging domain where labels are the result of a time-consuming analysis made by
one or more human experts. As nuclei, cells and glands are fundamental objects
for downstream analysis in computational pathology/cytology, in this paper we
propose a simple CNN-based approach to speed up collecting annotations for
these objects which requires minimum interaction from the annotator. We show
that for nuclei and cells in histology and cytology images, one click inside
each object is enough for NuClick to yield a precise annotation. For
multicellular structures such as glands, we propose a novel approach to provide
the NuClick with a squiggle as a guiding signal, enabling it to segment the
glandular boundaries. These supervisory signals are fed to the network as
auxiliary inputs along with RGB channels. With detailed experiments, we show
that NuClick is adaptable to the object scale, robust against variations in the
user input, adaptable to new domains, and delivers reliable annotations. An
instance segmentation model trained on masks generated by NuClick achieved the
first rank in LYON19 challenge. As exemplar outputs of our framework, we are
releasing two datasets: 1) a dataset of lymphocyte annotations within IHC
images, and 2) a dataset of segmented WBCs in blood smear images
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
We present Modular interactive VOS (MiVOS) framework which decouples
interaction-to-mask and mask propagation, allowing for higher generalizability
and better performance. Trained separately, the interaction module converts
user interactions to an object mask, which is then temporally propagated by our
propagation module using a novel top- filtering strategy in reading the
space-time memory. To effectively take the user's intent into account, a novel
difference-aware module is proposed to learn how to properly fuse the masks
before and after each interaction, which are aligned with the target frames by
employing the space-time memory. We evaluate our method both qualitatively and
quantitatively with different forms of user interactions (e.g., scribbles,
clicks) on DAVIS to show that our method outperforms current state-of-the-art
algorithms while requiring fewer frame interactions, with the additional
advantage in generalizing to different types of user interactions. We
contribute a large-scale synthetic VOS dataset with pixel-accurate segmentation
of 4.8M frames to accompany our source codes to facilitate future research.Comment: Accepted to CVPR 2021. Project page:
https://hkchengrex.github.io/MiVOS
Efficient Full Image Interactive Segmentation by Leveraging Within-image Appearance Similarity
We propose a new approach to interactive full-image semantic segmentation
which enables quickly collecting training data for new datasets with previously
unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o).
We leverage a key observation: propagation from labeled to unlabeled pixels
does not necessarily require class-specific knowledge, but can be done purely
based on appearance similarity within an image. We build on this observation
and propose an approach capable of jointly propagating pixel labels from
multiple classes without having explicit class-specific appearance models. To
enable long-range propagation, our approach first globally measures appearance
similarity between labeled and unlabeled pixels across the entire image. Then
it locally integrates per-pixel measurements which improves the accuracy at
boundaries and removes noisy label switches in homogeneous regions. We also
design an efficient manual annotation interface that extends the traditional
polygon drawing tools with a suite of additional convenient features (and add
automatic propagation to it). Experiments with human annotators on the COCO
Panoptic Challenge dataset show that the combination of our better manual
interface and our novel automatic propagation mechanism leads to reducing
annotation time by more than factor of 2x compared to polygon drawing. We also
test our method on the ADE-20k and Fashionista datasets without making any
dataset-specific adaptation nor retraining our model, demonstrating that it can
generalize to new datasets and visual classes
f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation
Deep neural networks have become a mainstream approach to interactive
segmentation. As we show in our experiments, while for some images a trained
network provides accurate segmentation result with just a few clicks, for some
unknown objects it cannot achieve satisfactory result even with a large amount
of user input. Recently proposed backpropagating refinement (BRS) scheme
introduces an optimization problem for interactive segmentation that results in
significantly better performance for the hard cases. At the same time, BRS
requires running forward and backward pass through a deep network several times
that leads to significantly increased computational budget per click compared
to other methods. We propose f-BRS (feature backpropagating refinement scheme)
that solves an optimization problem with respect to auxiliary variables instead
of the network inputs, and requires running forward and backward pass just for
a small part of a network. Experiments on GrabCut, Berkeley, DAVIS and SBD
datasets set new state-of-the-art at an order of magnitude lower time per click
compared to original BRS. The code and trained models are available at
https://github.com/saic-vul/fbrs_interactive_segmentation
Block Annotation: Better Image Annotation for Semantic Segmentation with Sub-Image Decomposition
Image datasets with high-quality pixel-level annotations are valuable for
semantic segmentation: labelling every pixel in an image ensures that rare
classes and small objects are annotated. However, full-image annotations are
expensive, with experts spending up to 90 minutes per image. We propose block
sub-image annotation as a replacement for full-image annotation. Despite the
attention cost of frequent task switching, we find that block annotations can
be crowdsourced at higher quality compared to full-image annotation with equal
monetary cost using existing annotation tools developed for full-image
annotation. Surprisingly, we find that 50% pixels annotated with blocks allows
semantic segmentation to achieve equivalent performance to 100% pixels
annotated. Furthermore, as little as 12% of pixels annotated allows performance
as high as 98% of the performance with dense annotation. In weakly-supervised
settings, block annotation outperforms existing methods by 3-4% (absolute)
given equivalent annotation time. To recover the necessary global structure for
applications such as characterizing spatial context and affordance
relationships, we propose an effective method to inpaint block-annotated images
with high-quality labels without additional human effort. As such, fewer
annotations can also be used for these applications compared to full-image
annotation.Comment: ICCV 2019; http://www.cs.cornell.edu/~hubert/block_annotation
Reviving Iterative Training with Mask Guidance for Interactive Segmentation
Recent works on click-based interactive segmentation have demonstrated
state-of-the-art results by using various inference-time optimization schemes.
These methods are considerably more computationally expensive compared to
feedforward approaches, as they require performing backward passes through a
network during inference and are hard to deploy on mobile frameworks that
usually support only forward passes. In this paper, we extensively evaluate
various design choices for interactive segmentation and discover that new
state-of-the-art results can be obtained without any additional optimization
schemes. Thus, we propose a simple feedforward model for click-based
interactive segmentation that employs the segmentation masks from previous
steps. It allows not only to segment an entirely new object, but also to start
with an external mask and correct it. When analyzing the performance of models
trained on different datasets, we observe that the choice of a training dataset
greatly impacts the quality of interactive segmentation. We find that the
models trained on a combination of COCO and LVIS with diverse and high-quality
annotations show performance superior to all existing models. The code and
trained models are available at
https://github.com/saic-vul/ritm_interactive_segmentation
A Review of methods for Textureless Object Recognition
Textureless object recognition has become a significant task in Computer
Vision with the advent of Robotics and its applications in manufacturing
sector. It has been very challenging to get good performance because of its
lack of discriminative features and reflectance properties. Hence, the
approaches used for textured objects cannot be applied for textureless objects.
A lot of work has been done in the last 20 years, especially in the recent 5
years after the TLess and other textureless dataset were introduced. In our
research, we plan to combine image processing techniques (for feature
enhancement) along with deep learning techniques (for object recognition). Here
we present an overview of the various existing work in the field of textureless
object recognition, which can be broadly classified into View-based,
Feature-based and Shape-based. We have also added a review of few of the
research papers submitted at the International Conference on Smart Multimedia,
2018. Index terms: Computer Vision, Textureless object detection, Textureless
object recognition, Feature-based, Edge detection, Deep LearningComment: 25 page
Probabilistic Attention for Interactive Segmentation
We provide a probabilistic interpretation of attention and show that the
standard dot-product attention in transformers is a special case of Maximum A
Posteriori (MAP) inference. The proposed approach suggests the use of
Expectation Maximization algorithms for online adaptation of key and value
model parameters. This approach is useful for cases in which external agents,
e.g., annotators, provide inference-time information about the correct values
of some tokens, e.g, the semantic category of some pixels, and we need for this
new information to propagate to other tokens in a principled manner. We
illustrate the approach on an interactive semantic segmentation task in which
annotators and models collaborate online to improve annotation efficiency.
Using standard benchmarks, we observe that key adaptation boosts model
performance ( mIoU) in the low feedback regime and value propagation
improves model responsiveness in the high feedback regime. A PyTorch layer
implementation of our probabilistic attention model will be made publicly
available here: https://github.com/apple/ml-probabilistic-attention.Comment: Updated with link to GitHub, 17 pages, 8 figure