10 research outputs found
Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild
Diffusion models have shown promising results on single-image
super-resolution and other image- to-image translation tasks. Despite this
success, they have not outperformed state-of-the-art GAN models on the more
challenging blind super-resolution task, where the input images are out of
distribution, with unknown degradations. This paper introduces SR3+, a
diffusion-based model for blind super-resolution, establishing a new
state-of-the-art. To this end, we advocate self-supervised training with a
combination of composite, parameterized degradations for self-supervised
training, and noise-conditioing augmentation during training and testing. With
these innovations, a large-scale convolutional architecture, and large-scale
datasets, SR3+ greatly outperforms SR3. It outperforms Real-ESRGAN when trained
on the same data, with a DRealSR FID score of 36.82 vs. 37.22, which further
improves to FID of 32.37 with larger models, and further still with larger
training sets
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Research on text-to-image generation has witnessed significant progress in
generating diverse and photo-realistic images, driven by diffusion and
auto-regressive models trained on large-scale image-text data. Though
state-of-the-art models can generate high-quality images of common entities,
they often have difficulty generating images of uncommon entities, such as
`Chortai (dog)' or `Picarones (food)'. To tackle this issue, we present the
Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model
that uses retrieved information to produce high-fidelity and faithful images,
even for rare or unseen entities. Given a text prompt, Re-Imagen accesses an
external multi-modal knowledge base to retrieve relevant (image, text) pairs
and uses them as references to generate the image. With this retrieval step,
Re-Imagen is augmented with the knowledge of high-level semantics and low-level
visual details of the mentioned entities, and thus improves its accuracy in
generating the entities' visual appearances. We train Re-Imagen on a
constructed dataset containing (image, text, retrieval) triples to teach the
model to ground on both text prompt and retrieval. Furthermore, we develop a
new sampling strategy to interleave the classifier-free guidance for text and
retrieval conditions to balance the text and retrieval alignment. Re-Imagen
achieves significant gain on FID score over COCO and WikiImage. To further
evaluate the capabilities of the model, we introduce EntityDrawBench, a new
benchmark that evaluates image generation for diverse entities, from frequent
to rare, across multiple object categories including dogs, foods, landmarks,
birds, and characters. Human evaluation on EntityDrawBench shows that Re-Imagen
can significantly improve the fidelity of generated images, especially on less
frequent entities.Comment: 9 page
Palette: Image-to-Image Diffusion Models
This paper develops a unified framework for image-to-image translation based
on conditional diffusion models and evaluates this framework on four
challenging image-to-image translation tasks, namely colorization, inpainting,
uncropping, and JPEG restoration. Our simple implementation of image-to-image
diffusion models outperforms strong GAN and regression baselines on all tasks,
without task-specific hyper-parameter tuning, architecture customization, or
any auxiliary loss or sophisticated new techniques needed. We uncover the
impact of an L2 vs. L1 loss in the denoising diffusion objective on sample
diversity, and demonstrate the importance of self-attention in the neural
architecture through empirical studies. Importantly, we advocate a unified
evaluation protocol based on ImageNet, with human evaluation and sample quality
scores (FID, Inception Score, Classification Accuracy of a pre-trained
ResNet-50, and Perceptual Distance against original images). We expect this
standardized evaluation protocol to play a role in advancing image-to-image
translation research. Finally, we show that a generalist, multi-task diffusion
model performs as well or better than task-specific specialist counterparts.
Check out https://diffusion-palette.github.io for an overview of the results
Combating False Negatives in Adversarial Imitation Learning
In adversarial imitation learning, a discriminator is trained to
differentiate agent episodes from expert demonstrations representing the
desired behavior. However, as the trained policy learns to be more successful,
the negative examples (the ones produced by the agent) become increasingly
similar to expert ones. Despite the fact that the task is successfully
accomplished in some of the agent's trajectories, the discriminator is trained
to output low values for them. We hypothesize that this inconsistent training
signal for the discriminator can impede its learning, and consequently leads to
worse overall performance of the agent. We show experimental evidence for this
hypothesis and that the 'False Negatives' (i.e. successful agent episodes)
significantly hinder adversarial imitation learning, which is the first
contribution of this paper. Then, we propose a method to alleviate the impact
of false negatives and test it on the BabyAI environment. This method
consistently improves sample efficiency over the baselines by at least an order
of magnitude.Comment: This is an extended version of the student abstract published at 34th
AAAI Conference on Artificial Intelligenc
Combating false negatives in adversarial imitation learning
We define the False Negatives problem and show that it is a significant limitation in adversarial imitation learning. We propose a method that solves the problem by leveraging the nature of goal-conditioned tasks. The method, dubbed Fake Conditioning, is tested on instruction following tasks in BabyAI environments, where it improves sample efficiency over the baselines by at least an order of magnitude