10 research outputs found
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models
Recent advances in diffusion models enable many powerful instruments for
image editing. One of these instruments is text-driven image manipulations:
editing semantic attributes of an image according to the provided text
description. % Popular text-conditional diffusion models offer various
high-quality image manipulation methods for a broad range of text prompts.
Existing diffusion-based methods already achieve high-quality image
manipulations for a broad range of text prompts. However, in practice, these
methods require high computation costs even with a high-end GPU. This greatly
limits potential real-world applications of diffusion-based image editing,
especially when running on user devices.
In this paper, we address efficiency of the recent text-driven editing
methods based on unconditional diffusion models and develop a novel algorithm
that learns image manipulations 4.5-10 times faster and applies them 8 times
faster. We carefully evaluate the visual quality and expressiveness of our
approach on multiple datasets using human annotators. Our experiments
demonstrate that our algorithm achieves the quality of much more expensive
methods. Finally, we show that our approach can adapt the pretrained model to
the user-specified image and text description on the fly just for 4 seconds. In
this setting, we notice that more compact unconditional diffusion models can be
considered as a rational alternative to the popular text-conditional
counterparts
TabDDPM: Modelling Tabular Data with Diffusion Models
Denoising diffusion probabilistic models are currently becoming the leading
paradigm of generative modeling for many important data modalities. Being the
most prevalent in the computer vision community, diffusion models have also
recently gained some attention in other domains, including speech, NLP, and
graph-like data. In this work, we investigate if the framework of diffusion
models can be advantageous for general tabular problems, where datapoints are
typically represented by vectors of heterogeneous features. The inherent
heterogeneity of tabular data makes it quite challenging for accurate modeling,
since the individual features can be of completely different nature, i.e., some
of them can be continuous and some of them can be discrete. To address such
data types, we introduce TabDDPM -- a diffusion model that can be universally
applied to any tabular dataset and handles any type of feature. We extensively
evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority
over existing GAN/VAE alternatives, which is consistent with the advantage of
diffusion models in other fields. Additionally, we show that TabDDPM is
eligible for privacy-oriented setups, where the original datapoints cannot be
publicly shared.Comment: code https://github.com/rotot0/tab-ddp
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Knowledge distillation methods have recently shown to be a promising
direction to speedup the synthesis of large-scale diffusion models by requiring
only a few inference steps. While several powerful distillation methods were
recently proposed, the overall quality of student samples is typically lower
compared to the teacher ones, which hinders their practical usage. In this
work, we investigate the relative quality of samples produced by the teacher
text-to-image diffusion model and its distilled student version. As our main
empirical finding, we discover that a noticeable portion of student samples
exhibit superior fidelity compared to the teacher ones, despite the
"approximate" nature of the student. Based on this finding, we propose an
adaptive collaboration between student and teacher diffusion models for
effective text-to-image synthesis. Specifically, the distilled model produces
the initial sample, and then an oracle decides whether it needs further
improvements with a slow teacher model. Extensive experiments demonstrate that
the designed pipeline surpasses state-of-the-art text-to-image alternatives for
various inference budgets in terms of human preference. Furthermore, the
proposed approach can be naturally used in popular applications such as
text-guided image editing and controllable generation.Comment: CVPR2024 camera ready v
magdalendobson/big-ann-benchmarks: Final artifact release
Framework for evaluating ANNS algorithms on billion scale datasets