5 research outputs found
Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators
Natural language generators for task-oriented dialogue must effectively
realize system dialogue actions and their associated semantics. In many
applications, it is also desirable for generators to control the style of an
utterance. To date, work on task-oriented neural generation has primarily
focused on semantic fidelity rather than achieving stylistic goals, while work
on style has been done in contexts where it is difficult to measure content
preservation. Here we present three different sequence-to-sequence models and
carefully test how well they disentangle content and style. We use a
statistical generator, Personage, to synthesize a new corpus of over 88,000
restaurant domain utterances whose style varies according to models of
personality, giving us total control over both the semantic content and the
stylistic variation in the training data. We then vary the amount of explicit
stylistic supervision given to the three models. We show that our most explicit
model can simultaneously achieve high fidelity to both semantic and stylistic
goals: this model adds a context vector of 36 stylistic parameters as input to
the hidden state of the encoder at each time step, showing the benefits of
explicit stylistic supervision, even when the amount of training data is large.Comment: To appear at SIGDIAL 201
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control