250 research outputs found
Time-series Generation by Contrastive Imitation
Consider learning a generative model for time-series data. The sequential
setting poses a unique challenge: Not only should the generator capture the
conditional dynamics of (stepwise) transitions, but its open-loop rollouts
should also preserve the joint distribution of (multi-step) trajectories. On
one hand, autoregressive models trained by MLE allow learning and computing
explicit transition distributions, but suffer from compounding error during
rollouts. On the other hand, adversarial models based on GAN training alleviate
such exposure bias, but transitions are implicit and hard to assess. In this
work, we study a generative framework that seeks to combine the strengths of
both: Motivated by a moment-matching objective to mitigate compounding error,
we optimize a local (but forward-looking) transition policy, where the
reinforcement signal is provided by a global (but stepwise-decomposable) energy
model trained by contrastive estimation. At training, the two components are
learned cooperatively, avoiding the instabilities typical of adversarial
objectives. At inference, the learned policy serves as the generator for
iterative sampling, and the learned energy serves as a trajectory-level measure
for evaluating sample quality. By expressly training a policy to imitate
sequential behavior of time-series features in a dataset, this approach
embodies "generation by imitation". Theoretically, we illustrate the
correctness of this formulation and the consistency of the algorithm.
Empirically, we evaluate its ability to generate predictively useful samples
from real-world datasets, verifying that it performs at the standard of
existing benchmarks
SparseGAN: Sparse Generative Adversarial Network for Text Generation
It is still a challenging task to learn a neural text generation model under
the framework of generative adversarial networks (GANs) since the entire
training process is not differentiable. The existing training strategies either
suffer from unreliable gradient estimations or imprecise sentence
representations. Inspired by the principle of sparse coding, we propose a
SparseGAN that generates semantic-interpretable, but sparse sentence
representations as inputs to the discriminator. The key idea is that we treat
an embedding matrix as an over-complete dictionary, and use a linear
combination of very few selected word embeddings to approximate the output
feature representation of the generator at each time step. With such
semantic-rich representations, we not only reduce unnecessary noises for
efficient adversarial training, but also make the entire training process fully
differentiable. Experiments on multiple text generation datasets yield
performance improvements, especially in sequence-level metrics, such as BLEU
Deep Generative Models on 3D Representations: A Survey
Generative models, as an important family of statistical modeling, target
learning the observed data distribution via generating new instances. Along
with the rise of neural networks, deep generative models, such as variational
autoencoders (VAEs) and generative adversarial network (GANs), have made
tremendous progress in 2D image synthesis. Recently, researchers switch their
attentions from the 2D space to the 3D space considering that 3D data better
aligns with our physical world and hence enjoys great potential in practice.
However, unlike a 2D image, which owns an efficient representation (i.e., pixel
grid) by nature, representing 3D data could face far more challenges.
Concretely, we would expect an ideal 3D representation to be capable enough to
model shapes and appearances in details, and to be highly efficient so as to
model high-resolution data with fast speed and low memory cost. However,
existing 3D representations, such as point clouds, meshes, and recent neural
fields, usually fail to meet the above requirements simultaneously. In this
survey, we make a thorough review of the development of 3D generation,
including 3D shape generation and 3D-aware image synthesis, from the
perspectives of both algorithms and more importantly representations. We hope
that our discussion could help the community track the evolution of this field
and further spark some innovative ideas to advance this challenging task
Prédiction et génération de données structurées à l'aide de réseaux de neurones et de décisions discrètes
L’apprentissage profond, une sous-discipline de l’apprentissage automatique, est de plus en
plus utilisé dans une multitude de domaines, dont le traitement du langage naturel. Toutefois,
plusieurs problèmes restent ouverts, notamment la prédiction de longues séquences et la
génération de langues naturelles. Dans le mémoire qui suit, nous présentons deux modèles
travaillant sur ces problèmes.
Dans le chapitre 1, nous incorporons un système de planification à l’intérieur des modèles
séquence-à -séquence. Pour ce faire, le modèle détermine à l’avance l’alignement entre la
sĂ©quence d’entrĂ©e et de sortie. Nous montrons que ce mĂ©canisme amĂ©liore l’alignement Ă
l’intérieur des modèles, converge plus rapidement et nécessite moins de paramètres. Nous
montrons également des gains de performance en traduction automatique, en génération de
questions ainsi que la découverte de circuits eulériens dans des graphes.
Dans le chapitre 2, nous appliquons des réseaux antagonistes génératifs aux langues
naturelles, une tâche compliquée par la nature discrète du domaine. Le modèle est entraîné de
manière purement non supervisée et n’utilise aucune estimation de gradients. Nous montrons
des résultats en modélisation de la langue, en génération de grammaires non contextuelles
et génération conditionnelle de phrases.Deep learning, a subdiscipline of machine learning, is used throughout multiple domains,
including natural language processing. However, in the field multiple problems remain open,
notably the prediction of long sequences and the generation of natural languages. In the
following thesis, we present two models that work toward solving both of these problems.
In chapter 1, we add a planning mechanism to sequence-to-sequence models. The mech-
anism consists of establishing ahead of time the alignment between the input and output
sequence. We show that this improves the alignment, help the model to converge faster, and
necessitate fewer parameters. We also show performance gain in neural machine translation,
questions generation, and the algorithmic task of finding Eulerian circuits in graphs.
In chapter 2, we tackle the language generation task using generative adversarial net-
works. A non-trivial problem considering the discrete nature of the output space. The
model is trained using only an adversarial loss and without any gradient estimation. We
show results on language modeling, context-free grammar generation, and conditional sen-
tence generation
Generation of realistic human behaviour
As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces
Contributions to generative models and their applications
Generative models are a large class of machine learning models for unsupervised learning. They have various applications in machine learning and artificial intelligence. In this thesis, we discuss many aspects of generative models and their applications to other machine learning problems. In particular, we discuss several important topics in generative models, including how to stabilize discrete GAN training with importance sampling, how to do better sampling from GANs using a connection with energy-based models, how to better train auto-regressive models with the help of an energy-based model formulation, as well as two applications of generative models to other machine learning problems, one about residual networks, the other about safety verification.Les modèles génératifs sont une grande classe de modèles d’apprentissage automatique pour
l’apprentissage non supervisé. Ils ont diverses applications dans l’apprentissage automatique
et l’intelligence artificielle. Dans cette thèse, nous discutons de nombreux aspects des modèles
génératifs et de leurs applications à d’autres problèmes d’apprentissage automatique. En
particulier, nous discutons de plusieurs sujets importants dans les modèles génératifs, y
compris comment stabiliser la formation GAN discrète avec un échantillonnage d’importance,
comment faire un meilleur Ă©chantillonnage Ă partir de GAN en utilisant une connexion avec
des modèles basés sur l’énergie, comment mieux former des modèles auto-régressifs avec
l’aide d’une formulation de modèle basée sur l’énergie, ainsi que deux applications de modèles
génératifs à d’autres problèmes d’apprentissage automatique, l’une sur les réseaux résiduels,
l’autre sur la vérification de la sécurité
Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion
Research on deep learning-powered voice conversion (VC) in speech-to-speech
scenarios is getting increasingly popular. Although many of the works in the
field of voice conversion share a common global pipeline, there is a
considerable diversity in the underlying structures, methods, and neural
sub-blocks used across research efforts. Thus, obtaining a comprehensive
understanding of the reasons behind the choice of the different methods in the
voice conversion pipeline can be challenging, and the actual hurdles in the
proposed solutions are often unclear. To shed light on these aspects, this
paper presents a scoping review that explores the use of deep learning in
speech analysis, synthesis, and disentangled speech representation learning
within modern voice conversion systems. We screened 621 publications from more
than 38 different venues between the years 2017 and 2023, followed by an
in-depth review of a final database consisting of 123 eligible studies. Based
on the review, we summarise the most frequently used approaches to voice
conversion based on deep learning and highlight common pitfalls within the
community. Lastly, we condense the knowledge gathered, identify main challenges
and provide recommendations for future research directions
Generating tabular datasets under differential privacy
Machine Learning (ML) is accelerating progress across fields and industries,
but relies on accessible and high-quality training data. Some of the most
important datasets are found in biomedical and financial domains in the form of
spreadsheets and relational databases. But this tabular data is often sensitive
in nature. Synthetic data generation offers the potential to unlock sensitive
data, but generative models tend to memorise and regurgitate training data,
which undermines the privacy goal. To remedy this, researchers have
incorporated the mathematical framework of Differential Privacy (DP) into the
training process of deep neural networks. But this creates a trade-off between
the quality and privacy of the resulting data. Generative Adversarial Networks
(GANs) are the dominant paradigm for synthesising tabular data under DP, but
suffer from unstable adversarial training and mode collapse, which are
exacerbated by the privacy constraints and challenging tabular data modality.
This work optimises the quality-privacy trade-off of generative models,
producing higher quality tabular datasets with the same privacy guarantees. We
implement novel end-to-end models that leverage attention mechanisms to learn
reversible tabular representations. We also introduce TableDiffusion, the first
differentially-private diffusion model for tabular data synthesis. Our
experiments show that TableDiffusion produces higher-fidelity synthetic
datasets, avoids the mode collapse problem, and achieves state-of-the-art
performance on privatised tabular data synthesis. By implementing
TableDiffusion to predict the added noise, we enabled it to bypass the
challenges of reconstructing mixed-type tabular data. Overall, the diffusion
paradigm proves vastly more data and privacy efficient than the adversarial
paradigm, due to augmented re-use of each data batch and a smoother iterative
training process
Review : Deep learning in electron microscopy
Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy
- …