189 research outputs found

    MLM Diffusion: Generating Globally-Consistent High-Resolution Images from Discrete Latent Spaces

    Get PDF
    Context/Background: Creating deep generative models capable of generating high-resolution images is a critical challenge for modern deep learning research, with far-reaching impacts in domains such as medical imaging and computer graphics. One method that has recently achieved great success in tackling this problem is probabilistic denoising diffusion. However, whilst diffusion models can generate high quality image content, key limitations remain in terms of high computational requirements. Aims: This thesis investigates new techniques to overcome the computational cost requirements that currently limit generative diffusion models. Specifically, this thesis focuses on training deep learning models to model and sample from discrete latent spaces that can be used to generate high-resolution images Method: This thesis introduces a novel type of diffusion probabilistic model prior capable of generating discrete latent representations of high-resolution images by utilising bidirectional transformers. The quality and diversity of images generated by these models are then evaluated and compared quantitatively and qualitatively to other similar models, before other interesting properties are also explored. Results: The proposed approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst also offering significant advantages in terms of computation time. Conclusions: Through the use of powerful bidirectional transformers and discretised latent spaces, it is possible to train a discrete diffusion model to generate high-quality, high-resolution images in only a fraction of the time required by continuous diffusion probabilistic models trained on the data space. Not only are these models faster to train and sample from, they also only require a single NVIDIA 2080ti GPU with 11GB of RAM for successful training and achieve state-of-the-art results in terms of generated image quality and diversit

    Dysarthric speech synthesis via non-parallel voice conversion

    Get PDF
    In this thesis we propose and evaluate a voice conversion (VC) method to synthesise dysarthric speech. This is achieved by a novel method for dysarthric speech synthesis using VC in a non-parallel manner, thus allowing VC in incomplete and difficult data collection situations. We focus on two applications: First, we aim to improve automatic speech recognition (ASR) of people with dysarthria by using synthesised dysarthric speech as means of data augmentation. Unimpaired speech is converted to dysarthric speech and used as training data for an ASR system. The results tested on unseen dysarthric words show that the recognition of severe dysarthric speakers can be improved, yet for mild speakers, an ASR trained with unimpaired speech performs better. Secondly, we want to synthesise pathological speech to help inform patients of their pathological speech before committing to an oral cancer surgery. Knowing the sound of the voice post-surgery could reduce the patients' stress and help clinicians make informed decisions about the surgery. A novel approach about pathological speech synthesis is proposed: we customise an existing dysarthric (already pathological) speech sample to a new speaker?s voice characteristics and perform a subjective analysis of the generated samples. The achieved results show that pathological speech seems to negatively affect the perceived naturalness of the speech. Conversion of speaker characteristics among low and high intelligibility speakers is successful, but for mid the results are inconclusive. Whether the differences in the results for the different intelligibility levels are due to the intelligibility levels or due to the speakers needs to be further investigated

    Modality-Agnostic Variational Compression of Implicit Neural Representations

    Full text link
    We introduce a modality-agnostic neural data compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). Bridging the gap between latent coding and sparsity, we obtain compact latent representations which are non-linearly mapped to a soft gating mechanism capable of specialising a shared INR base network to each data item through subnetwork selection. After obtaining a dataset of such compact latent representations, we directly optimise the rate/distortion trade-off in this modality-agnostic space using non-linear transform coding. We term this method Variational Compression of Implicit Neural Representation (VC-INR) and show both improved performance given the same representational capacity pre quantisation while also outperforming previous quantisation schemes used for other INR-based techniques. Our experiments demonstrate strong results over a large set of diverse data modalities using the same algorithm without any modality-specific inductive biases. We show results on images, climate data, 3D shapes and scenes as well as audio and video, introducing VC-INR as the first INR-based method to outperform codecs as well-known and diverse as JPEG 2000, MP3 and AVC/HEVC on their respective modalities

    Topic Modelling Meets Deep Neural Networks: A Survey

    Full text link
    Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, summarisation and language models. There is a need to summarise research developments and discuss open problems and future directions. In this paper, we provide a focused yet comprehensive overview of neural topic models for interested researchers in the AI community, so as to facilitate them to navigate and innovate in this fast-growing research area. To the best of our knowledge, ours is the first review focusing on this specific topic.Comment: A review on Neural Topic Model
    • …
    corecore