189 research outputs found
MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces
Context/Background: Creating deep generative models capable of generating high-resolution images
is a critical challenge for modern deep learning research, with far-reaching impacts in domains such
as medical imaging and computer graphics. One method that has recently achieved great success in
tackling this problem is probabilistic denoising diffusion. However, whilst diffusion models can generate
high quality image content, key limitations remain in terms of high computational requirements.
Aims: This thesis investigates new techniques to overcome the computational cost requirements that
currently limit generative diffusion models. Specifically, this thesis focuses on training deep learning
models to model and sample from discrete latent spaces that can be used to generate high-resolution
images
Method: This thesis introduces a novel type of diffusion probabilistic model prior capable of generating discrete latent representations of high-resolution images by utilising bidirectional transformers. The
quality and diversity of images generated by these models are then evaluated and compared quantitatively and qualitatively to other similar models, before other interesting properties are also explored.
Results: The proposed approach achieves state-of-the-art results in terms of Density (LSUN Bedroom:
1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73;
FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ:
6.11) whilst also offering significant advantages in terms of computation time.
Conclusions: Through the use of powerful bidirectional transformers and discretised latent spaces, it
is possible to train a discrete diffusion model to generate high-quality, high-resolution images in only
a fraction of the time required by continuous diffusion probabilistic models trained on the data space.
Not only are these models faster to train and sample from, they also only require a single NVIDIA
2080ti GPU with 11GB of RAM for successful training and achieve state-of-the-art results in terms of
generated image quality and diversit
Dysarthric speech synthesis via non-parallel voice conversion
In this thesis we propose and evaluate a voice conversion (VC) method to synthesise dysarthric speech. This is achieved by a novel method for dysarthric speech synthesis using VC in a non-parallel manner, thus allowing VC in incomplete and difficult data collection situations. We focus on two applications: First, we aim to improve automatic speech recognition (ASR) of people with dysarthria by using synthesised dysarthric speech as means of data augmentation. Unimpaired speech is converted to dysarthric speech and used as training data for an ASR system. The results tested on unseen dysarthric words show that the recognition of severe dysarthric speakers can be improved, yet for mild speakers, an ASR trained with unimpaired speech performs better. Secondly, we want to synthesise pathological speech to help inform patients of their pathological speech before committing to an oral cancer surgery. Knowing the sound of the voice post-surgery could reduce the patients' stress and help clinicians make informed decisions about the surgery. A novel approach about pathological speech synthesis is proposed: we customise an existing dysarthric (already pathological) speech sample to a new speaker?s voice characteristics and perform a subjective analysis of the generated samples. The achieved results show that pathological speech seems to negatively affect the perceived naturalness of the speech. Conversion of speaker characteristics among low and high intelligibility speakers is successful, but for mid the results are inconclusive. Whether the differences in the results for the different intelligibility levels are due to the intelligibility levels or due to the speakers needs to be further investigated
Modality-Agnostic Variational Compression of Implicit Neural Representations
We introduce a modality-agnostic neural data compression algorithm based on a
functional view of data and parameterised as an Implicit Neural Representation
(INR). Bridging the gap between latent coding and sparsity, we obtain compact
latent representations which are non-linearly mapped to a soft gating mechanism
capable of specialising a shared INR base network to each data item through
subnetwork selection. After obtaining a dataset of such compact latent
representations, we directly optimise the rate/distortion trade-off in this
modality-agnostic space using non-linear transform coding. We term this method
Variational Compression of Implicit Neural Representation (VC-INR) and show
both improved performance given the same representational capacity pre
quantisation while also outperforming previous quantisation schemes used for
other INR-based techniques. Our experiments demonstrate strong results over a
large set of diverse data modalities using the same algorithm without any
modality-specific inductive biases. We show results on images, climate data, 3D
shapes and scenes as well as audio and video, introducing VC-INR as the first
INR-based method to outperform codecs as well-known and diverse as JPEG 2000,
MP3 and AVC/HEVC on their respective modalities
Topic Modelling Meets Deep Neural Networks: A Survey
Topic modelling has been a successful technique for text analysis for almost
twenty years. When topic modelling met deep neural networks, there emerged a
new and increasingly popular research area, neural topic models, with over a
hundred models developed and a wide range of applications in neural language
understanding such as text generation, summarisation and language models. There
is a need to summarise research developments and discuss open problems and
future directions. In this paper, we provide a focused yet comprehensive
overview of neural topic models for interested researchers in the AI community,
so as to facilitate them to navigate and innovate in this fast-growing research
area. To the best of our knowledge, ours is the first review focusing on this
specific topic.Comment: A review on Neural Topic Model
- …