16 research outputs found
Domain transfer for deep natural language generation from abstract meaning representations
Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%
Deep reinforcement learning of dialogue policies with less weight updates
Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated
through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems
Recommended from our members
Recurrent Neural Network Language Generation for Dialogue Systems
Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations.
A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe.
Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.Tsung-Hsien Wen's Ph.D. is supported by Toshiba Research Europe Ltd, Cambridge Research Laborator
Evaluating the impact of variation in automatically generated embodied object descriptions
Institute for Communicating and Collaborative SystemsThe primary task for any system that aims to automatically generate human-readable output
is choice: the input to the system is usually well-specified, but there can be a wide range of
options for creating a presentation based on that input. When designing such a system, an
important decision is to select which aspects of the output are hard-wired and which allow
for dynamic variation. Supporting dynamic choice requires additional representation and
processing effort in the system, so it is important to ensure that incorporating variation has a
positive effect on the generated output.
In this thesis, we concentrate on two types of output generated by a multimodal dialogue
system: linguistic descriptions of objects drawn from a database, and conversational facial
displays of an embodied talking head. In a series of experiments, we add different types of
variation to one of these types of output. The impact of each implementation is then assessed
through a user evaluation in which human judges compare outputs generated by the basic
version of the system to those generated by the modified version; in some cases, we also use
automated metrics to compare the versions of the generated output.
This series of implementations and evaluations allows us to address three related issues. First,
we explore the circumstances under which users perceive and appreciate variation in generated
output. Second, we compare two methods of including variation into the output of a
corpus-based generation system. Third, we compare human judgements of output quality to
the predictions of a range of automated metrics.
The results of the thesis are as follows. The judges generally preferred output that incorporated
variation, except for a small number of cases where other aspects of the output obscured
it or the variation was not marked. In general, the output of systems that chose the majority
option was judged worse than that of systems that chose from a wider range of outputs.
However, the results for non-verbal displays were mixed: users mildly preferred agent outputs
where the facial displays were generated using stochastic techniques to those where a simple
rule was used, but the stochastic facial displays decreased users’ ability to identify contextual
tailoring in speech while the rule-based displays did not. Finally, automated metrics based on
simple corpus similarity favour generation strategies that do not diverge far from the average
corpus examples, which are exactly the strategies that human judges tend to dislike. Automated
metrics that measure other properties of the generated output correspond more closely
to users’ preferences
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System
Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or
product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems
Modellierung natü̈rlicher Dialoge im Kontext sprachbasierter Informations- und Steuersysteme
Current spoken dialogue systems are often criticised because they lack natural behaviour. In this thesis, a model to facilitate the development of user-friendly dialogues for information and control systems (e.g. travel-booking or smart room control systems) is created in order to address this problem. This also includes a study about the users' preferences, the classification of utterances according to intention and answer type and the development of a dialogue engine that can process dialogues based on this model.
The developed model describes the dialogue flow and the combination of questions, answers and the resulting actions. Features like mixed initiative, open-ended questions, subdialogues and adaptable phrasings of system utterances lead to more natural dialogues and increase the usability of dialogue systems while linguistic datatypes, abstract question descriptions in connection with answer types and language generation methods enormously facilitate the definition of such dialogues for the developer. The separation of dialogue model and dialogue engine makes possible the reuse of base functionalities und prevents the mixing of execution logic and dialogue knowledge. The engine contains dialogue acts to classify user utterances and to prevent ambiguities as well as language understanding modules to identify the user's goal. In addition, it infers the next dialogue step under consideration of the specified dialogue behaviour.
By implementing the Natural Dialogue System (NADIA), that runs the XML-based model, the functionality is proven.Aktuelle Sprachdialogsysteme haben oftmals den Ruf unnatürlich und nutzerunfreundlich zu sein. In dieser Dissertation wird diese Kritik adressiert und ein Modell zur einfacheren Entwicklung von nutzerfreundlichen Dialogen realisiert. Der Fokus liegt hierbei auf Informations- und Steuersystemen (z.B. Reisebuchungs- oder Raumsteuerungssysteme).
Das entwickelte Dialogmodell beschreibt den Dialogablauf und somit das Zusammenwirken von Fragen, Antworten sowie den resultierenden Aktionen. Es trägt durch Fähigkeiten wie gemischter Initiative, offenen Fragen, Subdialogen sowie adaptierbaren Formulierungen von Systemäußerungen zu natürlicheren Dialogen bei und erhöht so die Nutzbarkeit von Dialogsystemen. Linguistische Datentypen, abstrakte Fragebeschreibungen in Verbindung mit Antworttypen sowie Sprachgenerierungsverfahren vereinfachen die Definition solcher Dialoge dabei enorm. Die Trennung zwischen Dialogmodell und Dialogengine ermöglicht die Wiederverwendbarkeit von Basisfunktionalitäten und verhindert die Vermischung von Ausführungslogik und Dialogwissen. Die Engine beinhaltet Dialogakte zur Klassifizierung von Nutzeraussagen und zur Vermeidung von Ambiguitäten, Module zum Sprachverstehen und leitet außerdem den nächsten Dialogschritt anhand der Definition des Dialogverhaltens ab.
Durch die Implementierung des Natural Dialogue System (NADIA), welches als Dialogengine das XML-basierte Modell ausführt, wird abschließend die Funktionsfähigkeit nachgewiesen
MOG 2007:Workshop on Multimodal Output Generation: CTIT Proceedings
This volume brings together presents a wide variety of work offering different perspectives on multimodal generation. Two different strands of work can be distinguished: half of the gathered papers present current work on embodied conversational agents (ECA’s), while the other half presents current work on multimedia applications. Two general research questions are shared by all: what output modalities are most suitable in which situation, and how should different output modalities be combined