44,059 research outputs found
A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer
Unsupervised text style transfer aims to transfer the underlying style of
text but keep its main content unchanged without parallel data. Most existing
methods typically follow two steps: first separating the content from the
original style, and then fusing the content with the desired style. However,
the separation in the first step is challenging because the content and style
interact in subtle ways in natural language. Therefore, in this paper, we
propose a dual reinforcement learning framework to directly transfer the style
of the text via a one-step mapping model, without any separation of content and
style. Specifically, we consider the learning of the source-to-target and
target-to-source mappings as a dual task, and two rewards are designed based on
such a dual structure to reflect the style accuracy and content preservation,
respectively. In this way, the two one-step mapping models can be trained via
reinforcement learning, without any use of parallel data. Automatic evaluations
show that our model outperforms the state-of-the-art systems by a large margin,
especially with more than 8 BLEU points improvement averaged on two benchmark
datasets. Human evaluations also validate the effectiveness of our model in
terms of style accuracy, content preservation and fluency. Our code and data,
including outputs of all baselines and our model are available at
https://github.com/luofuli/DualLanST.Comment: Accepted by IJCAI 201
Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation
One crucial aspect of partial domain adaptation (PDA) is how to select the
relevant source samples in the shared classes for knowledge transfer. Previous
PDA methods tackle this problem by re-weighting the source samples based on
their high-level information (deep features). However, since the domain shift
between source and target domains, only using the deep features for sample
selection is defective. We argue that it is more reasonable to additionally
exploit the pixel-level information for PDA problem, as the appearance
difference between outlier source classes and target classes is significantly
large. In this paper, we propose a reinforced transfer network (RTNet), which
utilizes both high-level and pixel-level information for PDA problem. Our RTNet
is composed of a reinforced data selector (RDS) based on reinforcement learning
(RL), which filters out the outlier source samples, and a domain adaptation
model which minimizes the domain discrepancy in the shared label space.
Specifically, in the RDS, we design a novel reward based on the reconstruct
errors of selected source samples on the target generator, which introduces the
pixel-level information to guide the learning of RDS. Besides, we develope a
state containing high-level information, which used by the RDS for sample
selection. The proposed RDS is a general module, which can be easily integrated
into existing DA models to make them fit the PDA situation. Extensive
experiments indicate that RTNet can achieve state-of-the-art performance for
PDA tasks on several benchmark datasets
Sequence Generation with Guider Network
Sequence generation with reinforcement learning (RL) has received significant
attention recently. However, a challenge with such methods is the sparse-reward
problem in the RL training process, in which a scalar guiding signal is often
only available after an entire sequence has been generated. This type of sparse
reward tends to ignore the global structural information of a sequence, causing
generation of sequences that are semantically inconsistent. In this paper, we
present a model-based RL approach to overcome this issue. Specifically, we
propose a novel guider network to model the sequence-generation environment,
which can assist next-word prediction and provide intermediate rewards for
generator optimization. Extensive experiments show that the proposed method
leads to improved performance for both unconditional and conditional
sequence-generation tasks
Closed-Book Training to Improve Summarization Encoder Memory
A good neural sequence-to-sequence summarization model should have a strong
encoder that can distill and memorize the important information from long input
texts so that the decoder can generate salient summaries based on the encoder's
memory. In this paper, we aim to improve the memorization capabilities of the
encoder of a pointer-generator model by adding an additional 'closed-book'
decoder without attention and pointer mechanisms. Such a decoder forces the
encoder to be more selective in the information encoded in its memory state
because the decoder can't rely on the extra information provided by the
attention and possibly copy modules, and hence improves the entire model. On
the CNN/Daily Mail dataset, our 2-decoder model outperforms the baseline
significantly in terms of ROUGE and METEOR metrics, for both cross-entropy and
reinforced setups (and on human evaluation). Moreover, our model also achieves
higher scores in a test-only DUC-2002 generalizability setup. We further
present a memory ability test, two saliency metrics, as well as several
sanity-check ablations (based on fixed-encoder, gradient-flow cut, and model
capacity) to prove that the encoder of our 2-decoder model does in fact learn
stronger memory representations than the baseline encoder.Comment: EMNLP 2018 (16 pages
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
We introduce Texar, an open-source toolkit aiming to support the broad set of
text generation tasks that transform any inputs into natural language, such as
machine translation, summarization, dialog, content manipulation, and so forth.
With the design goals of modularity, versatility, and extensibility in mind,
Texar extracts common patterns underlying the diverse tasks and methodologies,
creates a library of highly reusable modules, and allows arbitrary model
architectures and algorithmic paradigms. In Texar, model architecture,
inference, and learning processes are properly decomposed. Modules at a high
concept level can be freely assembled and plugged in/swapped out. The toolkit
also supports a rich set of large-scale pretrained models. Texar is thus
particularly suitable for researchers and practitioners to do fast prototyping
and experimentation. The versatile toolkit also fosters technique sharing
across different text generation tasks. Texar supports both TensorFlow and
PyTorch, and is released under Apache License 2.0 at https://www.texar.io.Comment: ACL 2019 demo, expanded versio
AI-Powered Text Generation for Harmonious Human-Machine Interaction: Current State and Future Directions
In the last two decades, the landscape of text generation has undergone
tremendous changes and is being reshaped by the success of deep learning. New
technologies for text generation ranging from template-based methods to neural
network-based methods emerged. Meanwhile, the research objectives have also
changed from generating smooth and coherent sentences to infusing personalized
traits to enrich the diversification of newly generated content. With the rapid
development of text generation solutions, one comprehensive survey is urgent to
summarize the achievements and track the state of the arts. In this survey
paper, we present the general systematical framework, illustrate the widely
utilized models and summarize the classic applications of text generation.Comment: Accepted by IEEE UIC 201
Efficient Reinforcement Learning for Unsupervised Controlled Text Generation
Controlled text generation tasks such as unsupervised text style transfer
have increasingly adopted the use of Reinforcement Learning (RL). A major
challenge in applying RL to such tasks is the sparse reward, which is available
only after the full text is generated. Sparse rewards, combined with a large
action space make RL training sample-inefficient and difficult to converge.
Recently proposed reward-shaping strategies to address this issue have shown
only negligible gains. In contrast, this work proposes a novel approach that
provides dense rewards to each generated token. We evaluate our approach by its
usage in unsupervised text style transfer. Averaged across datasets, our style
transfer system improves upon current state-of-art systems by 21\% on human
evaluation and 12\% on automatic evaluation. Upon ablated comparison with the
current reward shaping approach (the `roll-out strategy'), using dense rewards
improves the overall style transfer quality by 22\% based on human evaluation.
Further the RL training is 2.5 times as sample efficient, and 7 times faster.Comment: 10 pages, 2 figures, 4 table
A Hierarchical Reinforced Sequence Operation Method for Unsupervised Text Style Transfer
Unsupervised text style transfer aims to alter text styles while preserving
the content, without aligned data for supervision. Existing seq2seq methods
face three challenges: 1) the transfer is weakly interpretable, 2) generated
outputs struggle in content preservation, and 3) the trade-off between content
and style is intractable. To address these challenges, we propose a
hierarchical reinforced sequence operation method, named Point-Then-Operate
(PTO), which consists of a high-level agent that proposes operation positions
and a low-level agent that alters the sentence. We provide comprehensive
training objectives to control the fluency, style, and content of the outputs
and a mask-based inference algorithm that allows for multi-step revision based
on the single-step trained agents. Experimental results on two text style
transfer datasets show that our method significantly outperforms recent methods
and effectively addresses the aforementioned challenges.Comment: Accepted to ACL 201
Synthesizing Programs for Images using Reinforced Adversarial Learning
Advances in deep generative networks have led to impressive results in recent
years. Nevertheless, such models can often waste their capacity on the minutiae
of datasets, presumably due to weak inductive biases in their decoders. This is
where graphics engines may come in handy since they abstract away low-level
details and represent images as high-level programs. Current methods that
combine deep learning and renderers are limited by hand-crafted likelihood or
distance functions, a need for large amounts of supervision, or difficulties in
scaling their inference algorithms to richer datasets. To mitigate these
issues, we present SPIRAL, an adversarially trained agent that generates a
program which is executed by a graphics engine to interpret and sample images.
The goal of this agent is to fool a discriminator network that distinguishes
between real and rendered data, trained with a distributed reinforcement
learning setup without any supervision. A surprising finding is that using the
discriminator's output as a reward signal is the key to allow the agent to make
meaningful progress at matching the desired output rendering. To the best of
our knowledge, this is the first demonstration of an end-to-end, unsupervised
and adversarial inverse graphics agent on challenging real world (MNIST,
Omniglot, CelebA) and synthetic 3D datasets.Comment: 12 pages, 13 figure
Generating summaries tailored to target characteristics
Recently, research efforts have gained pace to cater to varied user
preferences while generating text summaries. While there have been attempts to
incorporate a few handpicked characteristics such as length or entities, a
holistic view around these preferences is missing and crucial insights on why
certain characteristics should be incorporated in a specific manner are absent.
With this objective, we provide a categorization around these characteristics
relevant to the task of text summarization: one, focusing on what content needs
to be generated and second, focusing on the stylistic aspects of the output
summaries. We use our insights to provide guidelines on appropriate methods to
incorporate various classes characteristics in sequence-to-sequence
summarization framework. Our experiments with incorporating topics, readability
and simplicity indicate the viability of the proposed prescriptionsComment: Appeared in CiCLing 201
- …