70 research outputs found
One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models
Generative Models (GMs) have attracted considerable attention due to their
tremendous success in various domains, such as computer vision where they are
capable to generate impressive realistic-looking images. Likelihood-based GMs
are attractive due to the possibility to generate new data by a single model
evaluation. However, they typically achieve lower sample quality compared to
state-of-the-art score-based diffusion models (DMs). This paper provides a
significant step in the direction of addressing this limitation. The idea is to
borrow one of the strengths of score-based DMs, which is the ability to perform
accurate density estimation in low-density regions and to address manifold
overfitting by means of data mollification. We connect data mollification
through the addition of Gaussian noise to Gaussian homotopy, which is a
well-known technique to improve optimization. Data mollification can be
implemented by adding one line of code in the optimization loop, and we
demonstrate that this provides a boost in generation quality of
likelihood-based GMs, without computational overheads. We report results on
image data sets with popular likelihood-based GMs, including variants of
variational autoencoders and normalizing flows, showing large improvements in
FID score
Variations and Relaxations of Normalizing Flows
Normalizing Flows (NFs) describe a class of models that express a complex
target distribution as the composition of a series of bijective transformations
over a simpler base distribution. By limiting the space of candidate
transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and
density evaluation, enabling NFs to flexibly behave as both discriminative and
generative models. Their restriction to diffeomorphisms, however, enforces that
input, output and all intermediary spaces share the same dimension, limiting
their ability to effectively represent target distributions with complex
topologies. Additionally, in cases where the prior and target distributions are
not homeomorphic, Normalizing Flows can leak mass outside of the support of the
target. This survey covers a selection of recent works that combine aspects of
other generative model classes, such as VAEs and score-based diffusion, and in
doing so loosen the strict bijectivity constraints of NFs to achieve a balance
of expressivity, training speed, sample efficiency and likelihood tractability
Probabilistic Auto-Encoder
We introduce the Probabilistic Auto-Encoder (PAE), a generative model with a
lower dimensional latent space that is based on an Auto-Encoder which is
interpreted probabilistically after training using a Normalizing Flow. The PAE
combines the advantages of an Auto-Encoder, i.e. it is fast and easy to train
and achieves small reconstruction error, with the desired properties of a
generative model, such as high sample quality and good performance in
downstream tasks. Compared to a VAE and its common variants, the PAE trains
faster, reaches lower reconstruction error and achieves state of the art
samples without parameter fine-tuning or annealing schemes. We demonstrate that
the PAE is further a powerful model for performing the downstream tasks of
outlier detection and probabilistic image reconstruction: 1) Starting from the
Laplace approximation to the marginal likelihood, we identify a PAE-based
outlier detection metric which achieves state of the art results in
Out-of-Distribution detection outperforming other likelihood based estimators.
2) Using posterior analysis in the PAE latent space we perform high dimensional
data inpainting and denoising with uncertainty quantification.Comment: 11 pages, 6 figures. Code available at
https://github.com/VMBoehm/PAE. Updated version with additional references
and appendi
Transport, Variational Inference and Diffusions: with Applications to Annealed Flows and Schr\"odinger Bridges
This paper explores the connections between optimal transport and variational
inference, with a focus on forward and reverse time stochastic differential
equations and Girsanov transformations.We present a principled and systematic
framework for sampling and generative modelling centred around divergences on
path space. Our work culminates in the development of a novel score-based
annealed flow technique (with connections to Jarzynski and Crooks identities
from statistical physics) and a regularised iterative proportional fitting
(IPF)-type objective, departing from the sequential nature of standard IPF.
Through a series of generative modelling examples and a double-well-based rare
event task, we showcase the potential of the proposed methods.Comment: Workshop on New Frontiers in Learning, Control, and Dynamical Systems
at the International Conference on Machine Learning (ICML), Honolulu, Hawaii,
USA, 202
MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces
Context/Background: Creating deep generative models capable of generating high-resolution images
is a critical challenge for modern deep learning research, with far-reaching impacts in domains such
as medical imaging and computer graphics. One method that has recently achieved great success in
tackling this problem is probabilistic denoising diffusion. However, whilst diffusion models can generate
high quality image content, key limitations remain in terms of high computational requirements.
Aims: This thesis investigates new techniques to overcome the computational cost requirements that
currently limit generative diffusion models. Specifically, this thesis focuses on training deep learning
models to model and sample from discrete latent spaces that can be used to generate high-resolution
images
Method: This thesis introduces a novel type of diffusion probabilistic model prior capable of generating discrete latent representations of high-resolution images by utilising bidirectional transformers. The
quality and diversity of images generated by these models are then evaluated and compared quantitatively and qualitatively to other similar models, before other interesting properties are also explored.
Results: The proposed approach achieves state-of-the-art results in terms of Density (LSUN Bedroom:
1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73;
FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ:
6.11) whilst also offering significant advantages in terms of computation time.
Conclusions: Through the use of powerful bidirectional transformers and discretised latent spaces, it
is possible to train a discrete diffusion model to generate high-quality, high-resolution images in only
a fraction of the time required by continuous diffusion probabilistic models trained on the data space.
Not only are these models faster to train and sample from, they also only require a single NVIDIA
2080ti GPU with 11GB of RAM for successful training and achieve state-of-the-art results in terms of
generated image quality and diversit
Recommended from our members
Mixed Selectivity via Unsupervised Learning in Neural Networks
Mixed selectivity characterises neurons that simultaneously respond to different input stimuli. Neurons with mixed selectivity have been observed in multiple brain regions, and are hypothesised to play important roles in neural computation. Recently, both experimental and theoretical work demonstrated the importance of mixed selectivity in context-dependent decision tasks. This thesis extends existing theoretical work on mixed selectivity, arguing for a general and statistical role of mixed selectivity in learning complex dependencies of input stimuli. This role can be motivated from unsupervised learning of generative models, and is exhibited in increased mutual information between the stimuli and their neural representation. This argument is supported empirically using simulation of a sequence disambiguation task that incorporated key aspects of related behaviour experiments. Mixed selectivity neurons that resembled hippocampal place cells were emerged from models optimised only for behaviour. To understand these results, as well as to generalise the findings to a wider range of computations, I provided a formal connection between learning robust models and mixed selectivity
Structured Output Learning with Conditional Generative Flows
Traditional structured prediction models try to learn the conditional
likelihood, i.e., p(y|x), to capture the relationship between the structured
output y and the input features x. For many models, computing the likelihood is
intractable. These models are therefore hard to train, requiring the use of
surrogate objectives or variational inference to approximate likelihood. In
this paper, we propose conditional Glow (c-Glow), a conditional generative flow
for structured output learning. C-Glow benefits from the ability of flow-based
models to compute p(y|x) exactly and efficiently. Learning with c-Glow does not
require a surrogate objective or performing inference during training. Once
trained, we can directly and efficiently generate conditional samples. We
develop a sample-based prediction method, which can use this advantage to do
efficient and effective inference. In our experiments, we test c-Glow on five
different tasks. C-Glow outperforms the state-of-the-art baselines in some
tasks and predicts comparable outputs in the other tasks. The results show that
c-Glow is versatile and is applicable to many different structured prediction
problems.Comment: Accepted to AAAI 202
- …