3 research outputs found
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters
Research has shown that personality is a key driver to improve engagement and
user experience in conversational systems. Conversational agents should also
maintain a consistent persona to have an engaging conversation with a user.
However, text generation datasets are often crowd sourced and thereby have an
averaging effect where the style of the generation model is an average style of
all the crowd workers that have contributed to the dataset. While one can
collect persona-specific datasets for each task, it would be an expensive and
time consuming annotation effort. In this work, we propose a novel transfer
learning framework which updates only of model parameters to learn
style specific attributes for response generation. For the purpose of this
study, we tackle the problem of stylistic story ending generation using the ROC
stories Corpus. We learn style specific attributes from the
PERSONALITY-CAPTIONS dataset. Through extensive experiments and evaluation
metrics we show that our novel training procedure can improve the style
generation by 200 over Encoder-Decoder baselines while maintaining on-par
content relevance metrics wit
WriterForcing: Generating more interesting story endings
We study the problem of generating interesting endings for stories. Neural
generative models have shown promising results for various text generation
problems. Sequence to Sequence (Seq2Seq) models are typically trained to
generate a single output sequence for a given input sequence. However, in the
context of a story, multiple endings are possible. Seq2Seq models tend to
ignore the context and generate generic and dull responses. Very few works have
studied generating diverse and interesting story endings for a given story
context. In this paper, we propose models which generate more diverse and
interesting outputs by 1) training models to focus attention on important
keyphrases of the story, and 2) promoting generation of non-generic words. We
show that the combination of the two leads to more diverse and interesting
endings.Comment: Accepted in ACL workshop on Storytelling 201
A Tale of Two Regulatory Regimes: Creation and Analysis of a Bilingual Privacy Policy Corpus
Over the past decade, researchers have started to explore the use of NLP to develop tools aimed at helping the public, vendors, and regulators analyze disclosures made in privacy policies. With the introduction of new privacy regulations, the language of privacy policies is also evolving, and disclosures made by the same organization are not always the same in different languages, especially when used to communicate with users who fall under different jurisdictions. This work explores the use of language technologies to capture and analyze these differences at scale. We introduce an annotation scheme designed to capture the nuances of two new landmark privacy regulations, namely the EU\u27s GDPR and California\u27s CCPA/CPRA. We then introduce the first bilingual corpus of mobile app privacy policies consisting of 64 privacy policies in English (292K words) and 91 privacy policies in German (478K words), respectively with manual annotations for 8K and 19K fine-grained data practices. The annotations are used to develop computational methods that can automatically extract “disclosures” from privacy policies. Analysis of a subset of 59 “semi-parallel” policies reveals differences that can be attributed to different regulatory regimes, suggesting that systematic analysis of policies using automated language technologies is indeed a worthwhile endeavor. © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0