15 research outputs found
SLiMe: Segment Like Me
Significant strides have been made using large vision-language models, like
Stable Diffusion (SD), for a variety of downstream tasks, including image
editing, image correspondence, and 3D shape generation. Inspired by these
advancements, we explore leveraging these extensive vision-language models for
segmenting images at any desired granularity using as few as one annotated
sample by proposing SLiMe. SLiMe frames this problem as an optimization task.
Specifically, given a single training image and its segmentation mask, we first
extract attention maps, including our novel "weighted accumulated
self-attention map" from the SD prior. Then, using the extracted attention
maps, the text embeddings of Stable Diffusion are optimized such that, each of
them, learn about a single segmented region from the training image. These
learned embeddings then highlight the segmented region in the attention maps,
which in turn can then be used to derive the segmentation map. This enables
SLiMe to segment any real-world image during inference with the granularity of
the segmented region in the training image, using just one example. Moreover,
leveraging additional training data when available, i.e. few-shot, improves the
performance of SLiMe. We carried out a knowledge-rich set of experiments
examining various design factors and showed that SLiMe outperforms other
existing one-shot and few-shot segmentation methods
Deep Semantic Segmentation of Natural and Medical Images: A Review
The semantic image segmentation task consists of classifying each pixel of an
image into an instance, where each instance corresponds to a class. This task
is a part of the concept of scene understanding or better explaining the global
context of an image. In the medical image analysis domain, image segmentation
can be used for image-guided interventions, radiotherapy, or improved
radiological diagnostics. In this review, we categorize the leading deep
learning-based medical and non-medical image segmentation solutions into six
main groups of deep architectural, data synthesis-based, loss function-based,
sequenced models, weakly supervised, and multi-task methods and provide a
comprehensive review of the contributions in each of these groups. Further, for
each group, we analyze each variant of these groups and discuss the limitations
of the current approaches and present potential future research directions for
semantic image segmentation.Comment: 45 pages, 16 figures. Accepted for publication in Springer Artificial
Intelligence Revie
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
A fundamental challenge of over-parameterized deep learning models is
learning meaningful data representations that yield good performance on a
downstream task without over-fitting spurious input features. This work
proposes MaskTune, a masking strategy that prevents over-reliance on spurious
(or a limited number of) features. MaskTune forces the trained model to explore
new features during a single epoch finetuning by masking previously discovered
features. MaskTune, unlike earlier approaches for mitigating shortcut learning,
does not require any supervision, such as annotating spurious features or
labels for subgroup samples in a dataset. Our empirical results on biased
MNIST, CelebA, Waterbirds, and ImagenNet-9L datasets show that MaskTune is
effective on tasks that often suffer from the existence of spurious
correlations. Finally, we show that MaskTune outperforms or achieves similar
performance to the competing methods when applied to the selective
classification (classification with rejection option) task. Code for MaskTune
is available at https://github.com/aliasgharkhani/Masktune.Comment: Accepted to NeurIPS 202