Dialogue systems are artefacts that converse with human users in order to achieve
some task. Each step of the dialogue requires understanding the user's input, deciding
on what to reply, and generating an output utterance. Although there are
many ways to express any given content, most dialogue systems do not take linguistic
variation into account in both the understanding and generation phases,
i.e. the user's linguistic style is typically ignored, and the style conveyed by the
system is chosen once for all interactions at development time. We believe that
modelling linguistic variation can greatly improve the interaction in dialogue systems,
such as in intelligent tutoring systems, video games, or information retrieval
systems, which all require specific linguistic styles. Previous work has shown that
linguistic style affects many aspects of users' perceptions, even when the dialogue
is task-oriented. Moreover, users attribute a consistent personality to machines,
even when exposed to a limited set of cues, thus dialogue systems manifest personality
whether designed into the system or not. Over the past few years, psychologists
have identified the main dimensions of individual differences in human
behaviour: the Big Five personality traits. We hypothesise that the Big Five provide
a useful computational framework for modelling important aspects of linguistic
variation. This thesis first explores the possibility of recognising the user's personality
using data-driven models trained on essays and conversational data. We then
test whether it is possible to generate language varying consistently along each
personality dimension in the information presentation domain. We present PERSONAGE:
a language generator modelling findings from psychological studies to
project various personality traits. We use PERSONAGE to compare various generation
paradigms: (1) rule-based generation, (2) overgenerate and select and (3)
generation using parameter estimation models-a novel approach that learns to
produce recognisable variation along meaningful stylistic dimensions without the
computational cost incurred by overgeneration techniques. We also present the
first human evaluation of a data-driven generation method that projects multiple
stylistic dimensions simultaneously and on a continuous scale