Search CORE

250 research outputs found

Time-series Generation by Contrastive Imitation

Author: Bica Ioana
Jarrett Daniel
van der Schaar Mihaela
Publication venue
Publication date: 02/11/2023
Field of study

Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy, where the reinforcement signal is provided by a global (but stepwise-decomposable) energy model trained by contrastive estimation. At training, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality. By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies "generation by imitation". Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm. Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks

arXiv.org e-Print Archive

SparseGAN: Sparse Generative Adversarial Network for Text Generation

Author: Yuan Liping
Zeng Jiehang
Zheng Xiaoqing
Publication venue
Publication date: 22/03/2021
Field of study

It is still a challenging task to learn a neural text generation model under the framework of generative adversarial networks (GANs) since the entire training process is not differentiable. The existing training strategies either suffer from unreliable gradient estimations or imprecise sentence representations. Inspired by the principle of sparse coding, we propose a SparseGAN that generates semantic-interpretable, but sparse sentence representations as inputs to the discriminator. The key idea is that we treat an embedding matrix as an over-complete dictionary, and use a linear combination of very few selected word embeddings to approximate the output feature representation of the generator at each time step. With such semantic-rich representations, we not only reduce unnecessary noises for efficient adversarial training, but also make the entire training process fully differentiable. Experiments on multiple text generation datasets yield performance improvements, especially in sequence-level metrics, such as BLEU

arXiv.org e-Print Archive

Deep Generative Models on 3D Representations: A Survey

Author: Liao Yiyi
Peng Sida
Shen Yujun
Shi Zifan
Xu Yinghao
Publication venue
Publication date: 27/10/2022
Field of study

Generative models, as an important family of statistical modeling, target learning the observed data distribution via generating new instances. Along with the rise of neural networks, deep generative models, such as variational autoencoders (VAEs) and generative adversarial network (GANs), have made tremendous progress in 2D image synthesis. Recently, researchers switch their attentions from the 2D space to the 3D space considering that 3D data better aligns with our physical world and hence enjoys great potential in practice. However, unlike a 2D image, which owns an efficient representation (i.e., pixel grid) by nature, representing 3D data could face far more challenges. Concretely, we would expect an ideal 3D representation to be capable enough to model shapes and appearances in details, and to be highly efficient so as to model high-resolution data with fast speed and low memory cost. However, existing 3D representations, such as point clouds, meshes, and recent neural fields, usually fail to meet the above requirements simultaneously. In this survey, we make a thorough review of the development of 3D generation, including 3D shape generation and 3D-aware image synthesis, from the perspectives of both algorithms and more importantly representations. We hope that our discussion could help the community track the evolution of this field and further spark some innovative ideas to advance this challenging task

arXiv.org e-Print Archive

Prédiction et génération de données structurées à l'aide de réseaux de neurones et de décisions discrètes

Author: Dutil Francis
Publication venue
Publication date: 01/08/2018
Field of study

L’apprentissage profond, une sous-discipline de l’apprentissage automatique, est de plus en plus utilisé dans une multitude de domaines, dont le traitement du langage naturel. Toutefois, plusieurs problèmes restent ouverts, notamment la prédiction de longues séquences et la génération de langues naturelles. Dans le mémoire qui suit, nous présentons deux modèles travaillant sur ces problèmes. Dans le chapitre 1, nous incorporons un système de planification à l’intérieur des modèles séquence-à-séquence. Pour ce faire, le modèle détermine à l’avance l’alignement entre la séquence d’entrée et de sortie. Nous montrons que ce mécanisme améliore l’alignement à l’intérieur des modèles, converge plus rapidement et nécessite moins de paramètres. Nous montrons également des gains de performance en traduction automatique, en génération de questions ainsi que la découverte de circuits eulériens dans des graphes. Dans le chapitre 2, nous appliquons des réseaux antagonistes génératifs aux langues naturelles, une tâche compliquée par la nature discrète du domaine. Le modèle est entraîné de manière purement non supervisée et n’utilise aucune estimation de gradients. Nous montrons des résultats en modélisation de la langue, en génération de grammaires non contextuelles et génération conditionnelle de phrases.Deep learning, a subdiscipline of machine learning, is used throughout multiple domains, including natural language processing. However, in the field multiple problems remain open, notably the prediction of long sequences and the generation of natural languages. In the following thesis, we present two models that work toward solving both of these problems. In chapter 1, we add a planning mechanism to sequence-to-sequence models. The mech- anism consists of establishing ahead of time the alignment between the input and output sequence. We show that this improves the alignment, help the model to converge faster, and necessitate fewer parameters. We also show performance gain in neural machine translation, questions generation, and the algorithmic task of finding Eulerian circuits in graphs. In chapter 2, we tackle the language generation task using generative adversarial net- works. A non-trivial problem considering the discrete nature of the output space. The model is trained using only an adversarial loss and without any gradient estimation. We show results on language modeling, context-free grammar generation, and conditional sen- tence generation

Dépôt Institutionnel Numérique

Generation of realistic human behaviour

Author: Vougioukas Konstantinos
Publication venue: Computing, Imperial College London
Publication date: 01/08/2022
Field of study

As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

Spiral - Imperial College Digital Repository

Contributions to generative models and their applications

Author: Che Tong
Publication venue
Publication date: 01/10/2022
Field of study

Generative models are a large class of machine learning models for unsupervised learning. They have various applications in machine learning and artificial intelligence. In this thesis, we discuss many aspects of generative models and their applications to other machine learning problems. In particular, we discuss several important topics in generative models, including how to stabilize discrete GAN training with importance sampling, how to do better sampling from GANs using a connection with energy-based models, how to better train auto-regressive models with the help of an energy-based model formulation, as well as two applications of generative models to other machine learning problems, one about residual networks, the other about safety verification.Les modèles génératifs sont une grande classe de modèles d’apprentissage automatique pour l’apprentissage non supervisé. Ils ont diverses applications dans l’apprentissage automatique et l’intelligence artificielle. Dans cette thèse, nous discutons de nombreux aspects des modèles génératifs et de leurs applications à d’autres problèmes d’apprentissage automatique. En particulier, nous discutons de plusieurs sujets importants dans les modèles génératifs, y compris comment stabiliser la formation GAN discrète avec un échantillonnage d’importance, comment faire un meilleur échantillonnage à partir de GAN en utilisant une connexion avec des modèles basés sur l’énergie, comment mieux former des modèles auto-régressifs avec l’aide d’une formulation de modèle basée sur l’énergie, ainsi que deux applications de modèles génératifs à d’autres problèmes d’apprentissage automatique, l’une sur les réseaux résiduels, l’autre sur la vérification de la sécurité

Dépôt Institutionnel Numérique

Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion

Author: Bargum Anders R.
Erkut Cumhur
Serafin Stefania
Publication venue
Publication date: 14/11/2023
Field of study

Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios is getting increasingly popular. Although many of the works in the field of voice conversion share a common global pipeline, there is a considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods in the voice conversion pipeline can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 621 publications from more than 38 different venues between the years 2017 and 2023, followed by an in-depth review of a final database consisting of 123 eligible studies. Based on the review, we summarise the most frequently used approaches to voice conversion based on deep learning and highlight common pitfalls within the community. Lastly, we condense the knowledge gathered, identify main challenges and provide recommendations for future research directions

arXiv.org e-Print Archive

Generating tabular datasets under differential privacy

Author: Truda Gianluca
Publication venue
Publication date: 28/08/2023
Field of study

Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessible and high-quality training data. Some of the most important datasets are found in biomedical and financial domains in the form of spreadsheets and relational databases. But this tabular data is often sensitive in nature. Synthetic data generation offers the potential to unlock sensitive data, but generative models tend to memorise and regurgitate training data, which undermines the privacy goal. To remedy this, researchers have incorporated the mathematical framework of Differential Privacy (DP) into the training process of deep neural networks. But this creates a trade-off between the quality and privacy of the resulting data. Generative Adversarial Networks (GANs) are the dominant paradigm for synthesising tabular data under DP, but suffer from unstable adversarial training and mode collapse, which are exacerbated by the privacy constraints and challenging tabular data modality. This work optimises the quality-privacy trade-off of generative models, producing higher quality tabular datasets with the same privacy guarantees. We implement novel end-to-end models that leverage attention mechanisms to learn reversible tabular representations. We also introduce TableDiffusion, the first differentially-private diffusion model for tabular data synthesis. Our experiments show that TableDiffusion produces higher-fidelity synthetic datasets, avoids the mode collapse problem, and achieves state-of-the-art performance on privatised tabular data synthesis. By implementing TableDiffusion to predict the added noise, we enabled it to bypass the challenges of reconstructing mixed-type tabular data. Overall, the diffusion paradigm proves vastly more data and privacy efficient than the adversarial paradigm, due to augmented re-use of each data batch and a smoother iterative training process

arXiv.org e-Print Archive

Review : Deep learning in electron microscopy

Author: Ede Jeffrey M.
Publication venue: 'Center for Open Science'
Publication date: 18/09/2020
Field of study

Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository