1,759,265 research outputs found
Video Generation From Text
Generating videos from text has proven to be a significant challenge for
existing generative models. We tackle this problem by training a conditional
generative model to extract both static and dynamic information from text. This
is manifested in a hybrid framework, employing a Variational Autoencoder (VAE)
and a Generative Adversarial Network (GAN). The static features, called "gist,"
are used to sketch text-conditioned background color and object layout
structure. Dynamic features are considered by transforming input text into an
image filter. To obtain a large amount of data for training the deep-learning
model, we develop a method to automatically create a matched text-video corpus
from publicly available online videos. Experimental results show that the
proposed framework generates plausible and diverse videos, while accurately
reflecting the input text information. It significantly outperforms baseline
models that directly adapt text-to-image generation procedures to produce
videos. Performance is evaluated both visually and by adapting the inception
score used to evaluate image generation in GANs
SALSA-TEXT : self attentive latent space based adversarial text generation
Inspired by the success of self attention mechanism and Transformer
architecture in sequence transduction and image generation applications, we
propose novel self attention-based architectures to improve the performance of
adversarial latent code- based schemes in text generation. Adversarial latent
code-based text generation has recently gained a lot of attention due to their
promising results. In this paper, we take a step to fortify the architectures
used in these setups, specifically AAE and ARAE. We benchmark two latent
code-based methods (AAE and ARAE) designed based on adversarial setups. In our
experiments, the Google sentence compression dataset is utilized to compare our
method with these methods using various objective and subjective measures. The
experiments demonstrate the proposed (self) attention-based models outperform
the state-of-the-art in adversarial code-based text generation.Comment: 10 pages, 3 figures, under review at ICLR 201
Long Text Generation via Adversarial Training with Leaked Information
Automatically generating coherent and semantically meaningful text has many
applications in machine translation, dialogue systems, image captioning, etc.
Recently, by combining with policy gradient, Generative Adversarial Nets (GAN)
that use a discriminative model to guide the training of the generative model
as a reinforcement learning policy has shown promising results in text
generation. However, the scalar guiding signal is only available after the
entire text has been generated and lacks intermediate information about text
structure during the generative process. As such, it limits its success when
the length of the generated text samples is long (more than 20 words). In this
paper, we propose a new framework, called LeakGAN, to address the problem for
long text generation. We allow the discriminative net to leak its own
high-level extracted features to the generative net to further help the
guidance. The generator incorporates such informative signals into all
generation steps through an additional Manager module, which takes the
extracted features of current generated words and outputs a latent vector to
guide the Worker module for next-word generation. Our extensive experiments on
synthetic data and various real-world tasks with Turing test demonstrate that
LeakGAN is highly effective in long text generation and also improves the
performance in short text generation scenarios. More importantly, without any
supervision, LeakGAN would be able to implicitly learn sentence structures only
through the interaction between Manager and Worker.Comment: 14 pages, AAAI 201
Semantically Invariant Text-to-Image Generation
Image captioning has demonstrated models that are capable of generating
plausible text given input images or videos. Further, recent work in image
generation has shown significant improvements in image quality when text is
used as a prior. Our work ties these concepts together by creating an
architecture that can enable bidirectional generation of images and text. We
call this network Multi-Modal Vector Representation (MMVR). Along with MMVR, we
propose two improvements to the text conditioned image generation. Firstly, a
n-gram metric based cost function is introduced that generalizes the caption
with respect to the image. Secondly, multiple semantically similar sentences
are shown to help in generating better images. Qualitative and quantitative
evaluations demonstrate that MMVR improves upon existing text conditioned image
generation results by over 20%, while integrating visual and text modalities.Comment: 5 papers, 5 figures, Published in 2018 25th IEEE International
Conference on Image Processing (ICIP
- …
