40 research outputs found

    Wasserstein Autoencoders with Mixture of Gaussian Priors for Stylized Text Generation

    Get PDF
    Probabilistic text generation is an important application of Natural Language Processing (NLP). Variational autoencoders and Wasserstein autoencoders are two widely used methods for text generation. New research efforts focus on improving the quality of the generated samples for these two methods. While Wasserstein autoencoders are effective for text generation, they are unable to control the topic of generated text, even when the training dataset has samples from multiple categories with different styles. We present a semi-supervised approach using Wasserstein autoencoders and a mixture of Gaussian priors for topic-aware sentence generation. Our model is trained on a multi-class dataset and generates sentences in the style/topic of a desired class. It is also capable of interpolating multiple classes. Moreover, we can train our model on relatively small datasets. While a regular WAE or VAE cannot generate diverse sentences with few training samples, our approach generates diverse sentences and preserves the style and the content of the desired classes

    AI-generated Content for Various Data Modalities: A Survey

    Full text link
    AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the demonstrated potential of recent works, AIGC developments have been attracting lots of attention recently, and AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape (as voxels, point clouds, meshes, and neural implicit fields), 3D scene, 3D human avatar (body and head), 3D motion, and audio -- each presenting different characteristics and challenges. Furthermore, there have also been many significant developments in cross-modality AIGC methods, where generative methods can receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D shape, 3D scene, 3D avatar (body and head), 3D motion (skeleton and avatar), and audio modalities. In this paper, we provide a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we also discuss the challenges and potential future research directions

    On representation learning for generative models of text

    Full text link
    Cette thèse fait des petits pas dans la construction et la compréhension des systèmes d'apprentissage des représentations neuronales et des modèles génératifs pour le traitement du langage naturel. Il est présenté comme une thèse par article qui contient quatre travaux. Dans le premier article, nous montrons que l'apprentissage multi-tâches peut être utilisé pour combiner les biais inductifs de plusieurs tâches d'apprentissage auto-supervisées et supervisées pour apprendre des représentations de phrases distribuées de longueur fixe à usage général qui obtiennent des résultats solides sur les tâches d'apprentissage par transfert en aval sans tout modèle de réglage fin. Le deuxième article s'appuie sur le premier et présente un modèle génératif en deux étapes pour le texte qui modélise la distribution des représentations de phrases pour produire de nouveaux plongements de phrases qui servent de "contour neuronal" de haut niveau qui est reconstruit en mots avec un récurrent neuronal autorégressif conditionnel décodeur. Le troisième article étudie la nécessité de représentations démêlées pour la génération de texte contrôlable. Une grande partie des systèmes de génération de texte contrôlables reposent sur l'idée que le contrôle d'un attribut (ou d'un style) particulier nécessite la construction de représentations dissociées qui séparent le contenu et le style. Nous démontrons que les représentations produites dans des travaux antérieurs qui utilisent la formation contradictoire du domaine ne sont pas dissociées dans la pratique. Nous présentons ensuite une approche qui ne vise pas à apprendre des représentations démêlées et montrons qu'elle permet d'obtenir des résultats nettement meilleurs que les travaux antérieurs. Dans le quatrième article, nous concevons des modèles de langage de transformateur qui apprennent les représentations à plusieurs échelles de temps et montrent que ceux-ci peuvent aider à réduire l'empreinte mémoire importante de ces modèles. Il présente trois architectures multi-échelles différentes qui présentent des compromis favorables entre la perplexité et l'empreinte mémoire.This thesis takes baby steps in building and understanding neural representation learning systems and generative models for natural language processing. It is presented as a thesis by article that contains four pieces of work. In the first article, we show that multi-task learning can be used to combine the inductive biases of several self-supervised and supervised learning tasks to learn general-purpose fixed-length distributed sentence representations that achieve strong results on downstream transfer learning tasks without any model fine-tuning. The second article builds on the first and presents a two-step generative model for text that models the distribution of sentence representations to produce novel sentence embeddings that serves as a high level ``neural outline'' that is reconstructed to words with a conditional autoregressive RNN decoder. The third article studies the necessity of disentangled representations for controllable text generation. A large fraction of controllable text generation systems rely on the idea that control over a particular attribute (or style) requires building disentangled representations that separate content and style. We demonstrate that representations produced in previous work that uses domain adversarial training are not disentangled in practice. We then present an approach that does not aim to learn disentangled representations and show that it achieves significantly better results than prior work. In the fourth article, we design transformer language models that learn representations at multiple time scales and show that these can help address the large memory footprint these models typically have. It presents three different multi-scale architectures that exhibit favorable perplexity vs memory footprint trade-offs

    Representation Learning for Visual Data

    Full text link
    Cette thèse par article contribue au domaine de l’apprentissage de représentations profondes, et plus précisément celui des modèles génératifs profonds, par l’entremise de travaux sur les machines de Boltzmann restreintes, les modèles génératifs adversariels ainsi que le pastiche automatique. Le premier article s’intéresse au problème de l’estimation du gradient de la phase négative des machines de Boltzmann par l’échantillonnage d’une réalisation physique du modèle. Nous présentons une évaluation empirique de l’impact sur la performance, mesurée par log-vraisemblance négative, de diverses contraintes associées à l’implémentation physique de machines de Boltzmann restreintes (RBMs), soit le bruit sur les paramètres, l’amplitude limitée des paramètres et une connectivité limitée. Le second article s’attaque au problème de l’inférence dans les modèles génératifs adversariels (GANs). Nous proposons une extension du modèle appelée inférence adversativement apprise (ALI) qui a la particularité d’apprendre jointement l’inférence et la génération à partir d’un principe adversariel. Nous montrons que la représentation apprise par le modèle est utile à la résolution de tâches auxiliaires comme l’apprentissage semi-supervisé en obtenant une performance comparable à l’état de l’art pour les ensembles de données SVHN et CIFAR10. Finalement, le troisième article propose une approche simple et peu coûteuse pour entraîner un réseau unique de pastiche automatique à imiter plusieurs styles artistiques. Nous présentons un mécanisme de conditionnement, appelé normalisation conditionnelle par instance, qui permet au réseau d’imiter plusieurs styles en parallèle via l’apprentissage d’un ensemble de paramètres de normalisation unique à chaque style. Ce mécanisme s’avère très efficace en pratique et a inspiré plusieurs travaux subséquents qui ont appliqué l’idée à des problèmes au-delà du domaine du pastiche automatique.This thesis by articles contributes to the field of deep learning, and more specifically the subfield of deep generative modeling, through work on restricted Boltzmann machines, generative adversarial networks and style transfer networks. The first article examines the idea of tackling the problem of estimating the negative phase gradients in Boltzmann machines by sampling from a physical implementation of the model. We provide an empirical evaluation of the impact of various constraints associated with physical implementations of restricted Boltzmann machines (RBMs), namely noisy parameters, finite parameter amplitude and restricted connectivity patterns, on their performance as measured by negative log-likelihood through software simulation. The second article tackles the inference problem in generative adversarial networks (GANs). It proposes a simple and straightforward extension to the GAN framework, named adversarially learned inference (ALI), which allows inference to be learned jointly with generation in a fully-adversarial framework. We show that the learned representation is useful for auxiliary tasks such as semi-supervised learning by obtaining a performance competitive with the then-state-of-the-art on the SVHN and CIFAR10 semi-supervised learning tasks. Finally, the third article proposes a simple and scalable technique to train a single feedforward style transfer network to model multiple styles. It introduces a conditioning mechanism named conditional instance normalization which allows the network to capture multiple styles in parallel by learning a different set of instance normalization parameters for each style. This mechanism is shown to be very efficient and effective in practice, and has inspired multiple efforts to adapt the idea to problems outside of the artistic style transfer domain

    A deep generative model framework for creating high quality synthetic transaction sequences

    Get PDF
    Synthetic data are artificially generated data that closely model real-world measurements, and can be a valuable substitute for real data in domains where it is costly to obtain real data, or privacy concerns exist. Synthetic data has traditionally been generated using computational simulations, but deep generative models (DGMs) are increasingly used to generate high-quality synthetic data. In this thesis, we create a framework which employs DGMs for generating highquality synthetic transaction sequences. Transaction sequences, such as we may see in an online banking platform, or credit card statement, are important type of financial data for gaining insight into financial systems. However, research involving this type of data is typically limited to large financial institutions, as privacy concerns often prevent academic researchers from accessing this kind of data. Our work represents a step towards creating shareable synthetic transaction sequence datasets, containing data not connected to any actual humans. To achieve this goal, we begin by developing Banksformer, a DGM based on the transformer architecture, which is able to generate high-quality synthetic transaction sequences. Throughout the remainder of the thesis, we develop extensions to Banksformer that further improve the quality of data we generate. Additionally, we perform extensively examination of the quality synthetic data produced by our method, both with qualitative visualizations and quantitative metrics
    corecore