19 research outputs found
A Multiscale Framework for Challenging Discrete Optimization
Current state-of-the-art discrete optimization methods struggle behind when
it comes to challenging contrast-enhancing discrete energies (i.e., favoring
different labels for neighboring variables). This work suggests a multiscale
approach for these challenging problems. Deriving an algebraic representation
allows us to coarsen any pair-wise energy using any interpolation in a
principled algebraic manner. Furthermore, we propose an energy-aware
interpolation operator that efficiently exposes the multiscale landscape of the
energy yielding an effective coarse-to-fine optimization scheme. Results on
challenging contrast-enhancing energies show significant improvement over
state-of-the-art methods.Comment: 5 pages, 1 figure, To appear in NIPS Workshop on Optimization for
Machine Learning (December 2012). Camera-ready version. Fixed typos,
acknowledgements adde
DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering
Image segmentation is a fundamental task in computer vision. Data annotation
for training supervised methods can be labor-intensive, motivating unsupervised
methods. Current approaches often rely on extracting deep features from
pre-trained networks to construct a graph, and classical clustering methods
like k-means and normalized-cuts are then applied as a post-processing step.
However, this approach reduces the high-dimensional information encoded in the
features to pair-wise scalar affinities. To address this limitation, this study
introduces a lightweight Graph Neural Network (GNN) to replace classical
clustering methods while optimizing for the same clustering objective function.
Unlike existing methods, our GNN takes both the pair-wise affinities between
local image features and the raw features as input. This direct connection
between the raw features and the clustering objective enables us to implicitly
perform classification of the clusters between different graphs, resulting in
part semantic segmentation without the need for additional post-processing
steps. We demonstrate how classical clustering objectives can be formulated as
self-supervised loss functions for training an image segmentation GNN.
Furthermore, we employ the Correlation-Clustering (CC) objective to perform
clustering without defining the number of clusters, allowing for k-less
clustering. We apply the proposed method for object localization, segmentation,
and semantic part segmentation tasks, surpassing state-of-the-art performance
on multiple benchmarks
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Large-scale text-to-image generative models have been a revolutionary
breakthrough in the evolution of generative AI, allowing us to synthesize
diverse images that convey highly complex visual concepts. However, a pivotal
challenge in leveraging such models for real-world content creation tasks is
providing users with control over the generated content. In this paper, we
present a new framework that takes text-to-image synthesis to the realm of
image-to-image translation -- given a guidance image and a target text prompt,
our method harnesses the power of a pre-trained text-to-image diffusion model
to generate a new image that complies with the target text, while preserving
the semantic layout of the source image. Specifically, we observe and
empirically demonstrate that fine-grained control over the generated structure
can be achieved by manipulating spatial features and their self-attention
inside the model. This results in a simple and effective approach, where
features extracted from the guidance image are directly injected into the
generation process of the target image, requiring no training or fine-tuning
and applicable for both real or generated guidance images. We demonstrate
high-quality results on versatile text-guided image translation tasks,
including translating sketches, rough drawings and animations into realistic
images, changing of the class and appearance of objects in a given image, and
modifications of global qualities such as lighting and color
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
The generative AI revolution has recently expanded to videos. Nevertheless,
current state-of-the-art video models are still lagging behind image models in
terms of visual quality and user control over the generated content. In this
work, we present a framework that harnesses the power of a text-to-image
diffusion model for the task of text-driven video editing. Specifically, given
a source video and a target text-prompt, our method generates a high-quality
video that adheres to the target text, while preserving the spatial layout and
motion of the input video. Our method is based on a key observation that
consistency in the edited video can be obtained by enforcing consistency in the
diffusion feature space. We achieve this by explicitly propagating diffusion
features based on inter-frame correspondences, readily available in the model.
Thus, our framework does not require any training or fine-tuning, and can work
in conjunction with any off-the-shelf text-to-image editing method. We
demonstrate state-of-the-art editing results on a variety of real-world videos.
Webpage: https://diffusion-tokenflow.github.io
Boundary Driven Interactive Segmentation
Abstract—This paper presents a novel approach and interface to interactive image segmentation. Our interface uses sparse and inaccurate boundary cues provided by the user to produce a multi-layer segmentation of the image. Using boundary cues allows our interface to utilize a single “boundary brush ” to produce a multi-layer segmentation, making it appealing for devices with touch screen user interface. Our method utilizes recent advances in clustering to automatically recover the underlying number of layers without explicitly requiring the user to specify this input. I