205 research outputs found
Sketch-Guided Text-to-Image Diffusion Models
Text-to-Image models have introduced a remarkable leap in the evolution of
machine learning, demonstrating high-quality synthesis of images from a given
text-prompt. However, these powerful pretrained models still lack control
handles that can guide spatial properties of the synthesized images. In this
work, we introduce a universal approach to guide a pretrained text-to-image
diffusion model, with a spatial map from another domain (e.g., sketch) during
inference time. Unlike previous works, our method does not require to train a
dedicated model or a specialized encoder for the task. Our key idea is to train
a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron
(MLP) that maps latent features of noisy images to spatial maps, where the deep
features are extracted from the core Denoising Diffusion Probabilistic Model
(DDPM) network. The LGP is trained only on a few thousand images and
constitutes a differential guiding map predictor, over which the loss is
computed and propagated back to push the intermediate images to agree with the
spatial map. The per-pixel training offers flexibility and locality which
allows the technique to perform well on out-of-domain sketches, including
free-hand style drawings. We take a particular focus on the sketch-to-image
translation task, revealing a robust and expressive way to generate images that
follow the guidance of a sketch of arbitrary style or domain. Project page:
sketch-guided-diffusion.github.i
: Extended Textual Conditioning in Text-to-Image Generation
We introduce an Extended Textual Conditioning space in text-to-image models,
referred to as . This space consists of multiple textual conditions,
derived from per-layer prompts, each corresponding to a layer of the denoising
U-net of the diffusion model.
We show that the extended space provides greater disentangling and control
over image synthesis. We further introduce Extended Textual Inversion (XTI),
where the images are inverted into , and represented by per-layer tokens.
We show that XTI is more expressive and precise, and converges faster than
the original Textual Inversion (TI) space. The extended inversion method does
not involve any noticeable trade-off between reconstruction and editability and
induces more regular inversions.
We conduct a series of extensive experiments to analyze and understand the
properties of the new space, and to showcase the effectiveness of our method
for personalizing text-to-image models. Furthermore, we utilize the unique
properties of this space to achieve previously unattainable results in
object-style mixing using text-to-image models. Project page:
https://prompt-plus.github.i
ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints
Recent text-to-image generative models have enabled us to transform our words
into vibrant, captivating imagery. The surge of personalization techniques that
has followed has also allowed us to imagine unique concepts in new scenes.
However, an intriguing question remains: How can we generate a new, imaginary
concept that has never been seen before? In this paper, we present the task of
creative text-to-image generation, where we seek to generate new members of a
broad category (e.g., generating a pet that differs from all existing pets). We
leverage the under-studied Diffusion Prior models and show that the creative
generation problem can be formulated as an optimization process over the output
space of the diffusion prior, resulting in a set of "prior constraints". To
keep our generated concept from converging into existing members, we
incorporate a question-answering Vision-Language Model (VLM) that adaptively
adds new constraints to the optimization problem, encouraging the model to
discover increasingly more unique creations. Finally, we show that our prior
constraints can also serve as a strong mixing mechanism allowing us to create
hybrids between generated concepts, introducing even more flexibility into the
creative process.Comment: Project page: https://kfirgoldberg.github.io/ConceptLab
Bayesian Active Meta-Learning for Reliable and Efficient AI-Based Demodulation
Two of the main principles underlying the life cycle of an artificial
intelligence (AI) module in communication networks are adaptation and
monitoring. Adaptation refers to the need to adjust the operation of an AI
module depending on the current conditions; while monitoring requires measures
of the reliability of an AI module's decisions. Classical frequentist learning
methods for the design of AI modules fall short on both counts of adaptation
and monitoring, catering to one-off training and providing overconfident
decisions. This paper proposes a solution to address both challenges by
integrating meta-learning with Bayesian learning. As a specific use case, the
problems of demodulation and equalization over a fading channel based on the
availability of few pilots are studied. Meta-learning processes pilot
information from multiple frames in order to extract useful shared properties
of effective demodulators across frames. The resulting trained demodulators are
demonstrated, via experiments, to offer better calibrated soft decisions, at
the computational cost of running an ensemble of networks at run time. The
capacity to quantify uncertainty in the model parameter space is further
leveraged by extending Bayesian meta-learning to an active setting. In it, the
designer can select in a sequential fashion channel conditions under which to
generate data for meta-learning from a channel simulator. Bayesian active
meta-learning is seen in experiments to significantly reduce the number of
frames required to obtain efficient adaptation procedure for new frames.Comment: To appear in IEEE Transactions on Signal Processin
Calibrating AI Models for Few-Shot Demodulation via Conformal Prediction
AI tools can be useful to address model deficits in the design of
communication systems. However, conventional learning-based AI algorithms yield
poorly calibrated decisions, unabling to quantify their outputs uncertainty.
While Bayesian learning can enhance calibration by capturing epistemic
uncertainty caused by limited data availability, formal calibration guarantees
only hold under strong assumptions about the ground-truth, unknown, data
generation mechanism. We propose to leverage the conformal prediction framework
to obtain data-driven set predictions whose calibration properties hold
irrespective of the data distribution. Specifically, we investigate the design
of baseband demodulators in the presence of hard-to-model nonlinearities such
as hardware imperfections, and propose set-based demodulators based on
conformal prediction. Numerical results confirm the theoretical validity of the
proposed demodulators, and bring insights into their average prediction set
size efficiency.Comment: Submitted for a conference publicatio
Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in
the uplink can significantly enhance the efficiency of coexisting services,
such as enhanced mobile broadband (eMBB) devices, by only allocating resources
when necessary. The main challenge is posed by the uncertainty in the process
of URLLC packet generation, which mandates the use of predictors for URLLC
traffic in the coming frames. In practice, such prediction may overestimate or
underestimate the amount of URLLC data to be generated, yielding either an
excessive or an insufficient amount of resources to be pre-emptively allocated
for URLLC packets. In this paper, we introduce a novel scheduler for URLLC
packets that provides formal guarantees on reliability and latency irrespective
of the quality of the URLLC traffic predictor. The proposed method leverages
recent advances in online conformal prediction (CP), and follows the principle
of dynamically adjusting the amount of allocated resources so as to meet
reliability and latency requirements set by the designer
- …