Search CORE

10 research outputs found

Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

Author: Fleet David
Sahak Hshmat
Saharia Chitwan
Watson Daniel
Publication venue
Publication date: 15/02/2023
Field of study

Diffusion models have shown promising results on single-image super-resolution and other image- to-image translation tasks. Despite this success, they have not outperformed state-of-the-art GAN models on the more challenging blind super-resolution task, where the input images are out of distribution, with unknown degradations. This paper introduces SR3+, a diffusion-based model for blind super-resolution, establishing a new state-of-the-art. To this end, we advocate self-supervised training with a combination of composite, parameterized degradations for self-supervised training, and noise-conditioing augmentation during training and testing. With these innovations, a large-scale convolutional architecture, and large-scale datasets, SR3+ greatly outperforms SR3. It outperforms Real-ESRGAN when trained on the same data, with a DRealSR FID score of 36.82 vs. 37.22, which further improves to FID of 32.37 with larger models, and further still with larger training sets

arXiv.org e-Print Archive

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Author: Chen Wenhu
Cohen William W.
Hu Hexiang
Saharia Chitwan
Publication venue
Publication date: 21/11/2022
Field of study

Research on text-to-image generation has witnessed significant progress in generating diverse and photo-realistic images, driven by diffusion and auto-regressive models trained on large-scale image-text data. Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)' or `Picarones (food)'. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. Given a text prompt, Re-Imagen accesses an external multi-modal knowledge base to retrieve relevant (image, text) pairs and uses them as references to generate the image. With this retrieval step, Re-Imagen is augmented with the knowledge of high-level semantics and low-level visual details of the mentioned entities, and thus improves its accuracy in generating the entities' visual appearances. We train Re-Imagen on a constructed dataset containing (image, text, retrieval) triples to teach the model to ground on both text prompt and retrieval. Furthermore, we develop a new sampling strategy to interleave the classifier-free guidance for text and retrieval conditions to balance the text and retrieval alignment. Re-Imagen achieves significant gain on FID score over COCO and WikiImage. To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters. Human evaluation on EntityDrawBench shows that Re-Imagen can significantly improve the fidelity of generated images, especially on less frequent entities.Comment: 9 page

arXiv.org e-Print Archive

Palette: Image-to-Image Diffusion Models

Author: Chan William
Chang Huiwen
Fleet David J.
Ho Jonathan
Lee Chris A.
Norouzi Mohammad
Saharia Chitwan
Salimans Tim
Publication venue
Publication date: 03/05/2022
Field of study

This paper develops a unified framework for image-to-image translation based on conditional diffusion models and evaluates this framework on four challenging image-to-image translation tasks, namely colorization, inpainting, uncropping, and JPEG restoration. Our simple implementation of image-to-image diffusion models outperforms strong GAN and regression baselines on all tasks, without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss or sophisticated new techniques needed. We uncover the impact of an L2 vs. L1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention in the neural architecture through empirical studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, with human evaluation and sample quality scores (FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against original images). We expect this standardized evaluation protocol to play a role in advancing image-to-image translation research. Finally, we show that a generalist, multi-task diffusion model performs as well or better than task-specific specialist counterparts. Check out https://diffusion-palette.github.io for an overview of the results

arXiv.org e-Print Archive

Combating False Negatives in Adversarial Imitation Learning

Author: Bahdanau Dzmitry
Bengio Yoshua
Boussioux Leonard
Chevalier-Boisvert Maxime
Hui David Yu-Tung
Saharia Chitwan
Zolna Konrad
Publication venue
Publication date: 01/01/2020
Field of study

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.Comment: This is an extended version of the student abstract published at 34th AAAI Conference on Artificial Intelligenc

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

Combating false negatives in adversarial imitation learning

Author: Bahdanau Dzmitry
Bengio Yoshua
Boussioux Leonard
Chevalier-Boisvert Maxime
Saharia Chitwan
Yu-Tung Hui David
Żołna Konrad
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2020
Field of study

We define the False Negatives problem and show that it is a significant limitation in adversarial imitation learning. We propose a method that solves the problem by leveraging the nature of goal-conditioned tasks. The method, dubbed Fake Conditioning, is tested on instruction following tasks in BabyAI environments, where it improves sample efficiency over the baselines by at least an order of magnitude

Jagiellonian Univeristy Repository