6 research outputs found
Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems
Natural language generation (NLG) is a critical component of spoken dialogue
and it has a significant impact both on usability and perceived quality. Most
NLG systems in common use employ rules and heuristics and tend to generate
rigid and stylised responses without the natural variation of human language.
They are also not easily scaled to systems covering multiple domains and
languages. This paper presents a statistical language generator based on a
semantically controlled Long Short-term Memory (LSTM) structure. The LSTM
generator can learn from unaligned data by jointly optimising sentence planning
and surface realisation using a simple cross entropy training criterion, and
language variation can be easily achieved by sampling from output candidates.
With fewer heuristics, an objective evaluation in two differing test domains
showed the proposed method improved performance compared to previous methods.
Human judges scored the LSTM system higher on informativeness and naturalness
and overall preferred it to the other systems.Comment: To be appear in EMNLP 201
Distributed dialogue policies for multi-domain statistical dialogue management
Statistical dialogue systems offer the potential to reduce costs by learning policies automatically on-line, but are not designed to scale to large open-domains. This paper proposes a hierarchical distributed dialogue architecture in which policies are organised in a class hierarchy aligned to an underlying knowledge graph. This allows a system to be deployed using a modest amount of data to train a small set of generic policies. As further data is collected, generic policies can be adapted to give in-domain performance. Using Gaussian process-based reinforcement learning, it is shown that within this framework generic policies can be constructed which provide acceptable user performance, and better performance than can be obtained using under-trained domain specific policies. It is also shown that as sufficient in-domain data becomes available, it is possible to seamlessly improve performance, without subjecting users to unacceptable behaviour during the adaptation period and without limiting the final performance compared to policies trained from scratch