70 research outputs found

    One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models

    Full text link
    Generative Models (GMs) have attracted considerable attention due to their tremendous success in various domains, such as computer vision where they are capable to generate impressive realistic-looking images. Likelihood-based GMs are attractive due to the possibility to generate new data by a single model evaluation. However, they typically achieve lower sample quality compared to state-of-the-art score-based diffusion models (DMs). This paper provides a significant step in the direction of addressing this limitation. The idea is to borrow one of the strengths of score-based DMs, which is the ability to perform accurate density estimation in low-density regions and to address manifold overfitting by means of data mollification. We connect data mollification through the addition of Gaussian noise to Gaussian homotopy, which is a well-known technique to improve optimization. Data mollification can be implemented by adding one line of code in the optimization loop, and we demonstrate that this provides a boost in generation quality of likelihood-based GMs, without computational overheads. We report results on image data sets with popular likelihood-based GMs, including variants of variational autoencoders and normalizing flows, showing large improvements in FID score

    Variations and Relaxations of Normalizing Flows

    Full text link
    Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. Their restriction to diffeomorphisms, however, enforces that input, output and all intermediary spaces share the same dimension, limiting their ability to effectively represent target distributions with complex topologies. Additionally, in cases where the prior and target distributions are not homeomorphic, Normalizing Flows can leak mass outside of the support of the target. This survey covers a selection of recent works that combine aspects of other generative model classes, such as VAEs and score-based diffusion, and in doing so loosen the strict bijectivity constraints of NFs to achieve a balance of expressivity, training speed, sample efficiency and likelihood tractability

    Probabilistic Auto-Encoder

    Full text link
    We introduce the Probabilistic Auto-Encoder (PAE), a generative model with a lower dimensional latent space that is based on an Auto-Encoder which is interpreted probabilistically after training using a Normalizing Flow. The PAE combines the advantages of an Auto-Encoder, i.e. it is fast and easy to train and achieves small reconstruction error, with the desired properties of a generative model, such as high sample quality and good performance in downstream tasks. Compared to a VAE and its common variants, the PAE trains faster, reaches lower reconstruction error and achieves state of the art samples without parameter fine-tuning or annealing schemes. We demonstrate that the PAE is further a powerful model for performing the downstream tasks of outlier detection and probabilistic image reconstruction: 1) Starting from the Laplace approximation to the marginal likelihood, we identify a PAE-based outlier detection metric which achieves state of the art results in Out-of-Distribution detection outperforming other likelihood based estimators. 2) Using posterior analysis in the PAE latent space we perform high dimensional data inpainting and denoising with uncertainty quantification.Comment: 11 pages, 6 figures. Code available at https://github.com/VMBoehm/PAE. Updated version with additional references and appendi

    Transport, Variational Inference and Diffusions: with Applications to Annealed Flows and Schr\"odinger Bridges

    Full text link
    This paper explores the connections between optimal transport and variational inference, with a focus on forward and reverse time stochastic differential equations and Girsanov transformations.We present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of a novel score-based annealed flow technique (with connections to Jarzynski and Crooks identities from statistical physics) and a regularised iterative proportional fitting (IPF)-type objective, departing from the sequential nature of standard IPF. Through a series of generative modelling examples and a double-well-based rare event task, we showcase the potential of the proposed methods.Comment: Workshop on New Frontiers in Learning, Control, and Dynamical Systems at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 202

    MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces

    Get PDF
    Context/Background: Creating deep generative models capable of generating high-resolution images is a critical challenge for modern deep learning research, with far-reaching impacts in domains such as medical imaging and computer graphics. One method that has recently achieved great success in tackling this problem is probabilistic denoising diffusion. However, whilst diffusion models can generate high quality image content, key limitations remain in terms of high computational requirements. Aims: This thesis investigates new techniques to overcome the computational cost requirements that currently limit generative diffusion models. Specifically, this thesis focuses on training deep learning models to model and sample from discrete latent spaces that can be used to generate high-resolution images Method: This thesis introduces a novel type of diffusion probabilistic model prior capable of generating discrete latent representations of high-resolution images by utilising bidirectional transformers. The quality and diversity of images generated by these models are then evaluated and compared quantitatively and qualitatively to other similar models, before other interesting properties are also explored. Results: The proposed approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst also offering significant advantages in terms of computation time. Conclusions: Through the use of powerful bidirectional transformers and discretised latent spaces, it is possible to train a discrete diffusion model to generate high-quality, high-resolution images in only a fraction of the time required by continuous diffusion probabilistic models trained on the data space. Not only are these models faster to train and sample from, they also only require a single NVIDIA 2080ti GPU with 11GB of RAM for successful training and achieve state-of-the-art results in terms of generated image quality and diversit

    Generative Models for Inverse Imaging Problems

    Get PDF

    Structured Output Learning with Conditional Generative Flows

    Full text link
    Traditional structured prediction models try to learn the conditional likelihood, i.e., p(y|x), to capture the relationship between the structured output y and the input features x. For many models, computing the likelihood is intractable. These models are therefore hard to train, requiring the use of surrogate objectives or variational inference to approximate likelihood. In this paper, we propose conditional Glow (c-Glow), a conditional generative flow for structured output learning. C-Glow benefits from the ability of flow-based models to compute p(y|x) exactly and efficiently. Learning with c-Glow does not require a surrogate objective or performing inference during training. Once trained, we can directly and efficiently generate conditional samples. We develop a sample-based prediction method, which can use this advantage to do efficient and effective inference. In our experiments, we test c-Glow on five different tasks. C-Glow outperforms the state-of-the-art baselines in some tasks and predicts comparable outputs in the other tasks. The results show that c-Glow is versatile and is applicable to many different structured prediction problems.Comment: Accepted to AAAI 202
    • …
    corecore