9 research outputs found
Task Oriented Dialog Systems
Task-oriented dialog systems hold numerous applications in assisting users to achieve various goals. They often comprise of a pipeline of individual components. In this work, our contribution is towards two such components, namely, dialog state tracker and natural language generator. A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user’s goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, entailing the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Furthermore, we do empirical analyses to show that our method improves joint goal accuracy on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art models. Secondly, we study a family of deep generative models for generating system response in a task-oriented dialog setting. The language generation tasks involve conditioning the output of the generative models on the current dialog state, system act and the previous user utterance. Finally, we do qualitative analysis and report the perplexity scores for a transformer encoder-decoder model and a conditional variational auto-encoder on schema guided dialog state tracking dataset
Recommended from our members
Dialogue Systems Specialized in Social Influence: Systems, Methods, and Ethics
This thesis concerns the task of how to develop dialogue systems specialized in social influence and problems around deploying such systems. Dialogue systems have become widely adopted in our daily life. Most dialogue systems are primarily focused on information-seeking tasks or social companionship. However, they cannot apply strategies in complex and critical social influence tasks, such as healthy habit promotion, emotional support, etc. In this work, we formally define social influence dialogue systems to be systems that influence users’ behaviors, feelings, thoughts, or opinions through natural conversations. We also present methods to make such systems intelligible, privacy-preserving, and thus deployable in real life. Finally, we acknowledge potential ethical issues around social influence systems and propose solutions to mitigate them in Chapter 6.
Social influence dialogues span various domains, such as persuasion, negotiation, and recommendation. We first propose a donation persuasion task, PERSUASIONFORGOOD, and ground our study on this persuasion task for social good. We then build a persuasive dialogue system, by refining the dialogue model for intelligibility and imitating human experts for persuasiveness, and a negotiation agent that can play the game of Diplomacy by decoupling the planning engine and the dialogue generation module to improve controllability of social influence systems. To deploy such a system in the wild, our work examines how humans perceive the AI agent’s identity, and how their perceptions impact the social influence outcome. Moreover, dialogue models are trained on conversations, where people could share personal information. This creates privacy concerns for deployment as the models may memorize private information.
To protect user privacy in the training data, our work develops privacy-preserving learning algorithms to ensure deployed models are safe under privacy attacks. Finally, deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. So we propose JUICER, a framework to make use of both binary and free-form textual human feedback to augment the training data and keep improving dialogue model performance after deployment. Building social influence dialogue systems enables us to research future expert-level AI systems that are accessible via natural languages, accountable with domain knowledge, and privacy-preserving with privacy guarantees
Data-efficient methods for dialogue systems
Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa or more business-oriented customer support automation
solutions. Deep learning underlies many recent breakthroughs in dialogue systems but requires
very large amounts of training data, often annotated by experts — and this dramatically increases the cost of deploying such systems in production setups and reduces their flexibility as
software products. Trained with smaller data, these methods end up severely lacking robustness
to various phenomena of spoken language (e.g. disfluencies), out-of-domain input, and often
just have too little generalisation power to other tasks and domains.
In this thesis, we address the above issues by introducing a series of methods for bootstrapping
robust dialogue systems from minimal data. Firstly, we study two orthogonal approaches to dialogue: a linguistically informed model (DyLan) and a machine learning-based one (MemN2N) —
from the data efficiency perspective, i.e. their potential to generalise from minimal data and
robustness to natural spontaneous input. We outline the steps to obtain data-efficient solutions
with either approach and proceed with the neural models for the rest of the thesis.
We then introduce the core contributions of this thesis, two data-efficient models for dialogue
response generation: the Dialogue Knowledge Transfer Network (DiKTNet) based on transferable latent dialogue representations, and the Generative-Retrieval Transformer (GRTr) combining response generation logic with a retrieval mechanism as the fallback. GRTr ranked first at
the Dialog System Technology Challenge 8 Fast Domain Adaptation task.
Next, we the problem of training robust neural models from minimal data. As such, we look at
robustness to disfluencies and propose a multitask LSTM-based model for domain-general disfluency detection. We then go on to explore robustness to anomalous, or out-of-domain (OOD)
input. We address this problem by (1) presenting Turn Dropout, a data-augmentation technique
facilitating training for anomalous input only using in-domain data, and (2) introducing VHCN
and AE-HCN, autoencoder-augmented models for efficient training with turn dropout based on
the Hybrid Code Networks (HCN) model family.
With all the above work addressing goal-oriented dialogue, our final contribution in this thesis
focuses on social dialogue where the main objective is maintaining natural, coherent, and engaging conversation for as long as possible. We introduce a neural model for response ranking
in social conversation used in Alana, the 3rd place winner in the Amazon Alexa Prize 2017 and
2018. For our model, we employ a novel technique of predicting the dialogue length as the main
objective for ranking. We show that this approach matches the performance of its counterpart
based on the conventional, human rating-based objective — and surpasses it given more raw
dialogue transcripts, thus reducing the dependence on costly and cumbersome dialogue annotations.EPSRC project BABBLE (grant EP/M01553X/1)
Conversation analysis for computational modelling of task-oriented dialogue
Current methods of dialogue modelling for Conversational AI (CAI) bear little resemblance to the manner in which humans organise conversational interactions. The way utterances are represented, interpreted, and generated are determined by the necessities of the chosen technique and do not resemble those used during natural conversation. In this research we propose a new method of representing task-oriented dialogue, for the purpose of computational modelling, which draws inspiration from the study of human conversational structures, Conversation Analysis (CA). Our approach unifies two well established, yet disparate, methods of dialogue representation: Dialogue Acts (DA), which provide valuable semantic and intentional information, and the Adjacency Pair (AP), which are the predominant method by which structure is defined within CA. This computationally compatible approach subsequently benefits from the strengths, whilst overcoming the weaknesses, of its components.To evaluate this thesis we first develop and evaluate a novel CA Modelling Schema (CAMS), which combines concepts of DA’s and AP’s to form AP-type labels. Thus creating a single annotation scheme that is able to capture the semantic and syntactic structure of dialogue. We additionally annotate a task-oriented corpus with our schema to create CAMS-KVRET, a first-of-its-kind DA and AP labelled dataset. Next, we conduct detailed investigations of input representation and architectural considerations in order to develop and refine several ML models capable of automatically labelling dialogue with CAMS labels. Finally, we evaluate our proposed method of dialogue representation, and accompanying models, against several dialogue modelling tasks, including next label prediction, response generation, and structure representation.With our evaluation of CAMS we show that it is both reproducible, and inherently learnable, even for novice annotators. And further, that it is most intuitively applied to task-oriented dialogues. During development of our ML classifiers we determined that, in most cases, input and architectural choices are equally applicable to DA and AP classification. We evaluated our classification models against CAMS-KVRET, and achieved high test set classification accuracy for all label components of the corpus. Additionally, we were able to show that, not only is our model capable of learning the semantic and structural aspects of both the DA and AP components, but also that AP are more predictive of future utterance labels, and thus representative of the overall dialogue structure. These finding were further supported by the results of our next-label prediction and response generation experiments. Moreover, we found AP were able to reduce the perplexity of the generative model. Finally, by using χ2 analysis to create dialogue structure graphs, we demonstrate that AP produce a more generalised and efficient method of dialogue representation. Thus, our research has shown that integrating DA with AP, into AP-type labels, captures the semantic and syntactic structure of an interaction, in a format that is independent of the domain or topic, and which benefits the computational modelling of task-oriented dialogues