9 research outputs found

    Task Oriented Dialog Systems

    Get PDF
    Task-oriented dialog systems hold numerous applications in assisting users to achieve various goals. They often comprise of a pipeline of individual components. In this work, our contribution is towards two such components, namely, dialog state tracker and natural language generator. A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user’s goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, entailing the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Furthermore, we do empirical analyses to show that our method improves joint goal accuracy on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art models. Secondly, we study a family of deep generative models for generating system response in a task-oriented dialog setting. The language generation tasks involve conditioning the output of the generative models on the current dialog state, system act and the previous user utterance. Finally, we do qualitative analysis and report the perplexity scores for a transformer encoder-decoder model and a conditional variational auto-encoder on schema guided dialog state tracking dataset

    Data-efficient methods for dialogue systems

    Get PDF
    Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa or more business-oriented customer support automation solutions. Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts — and this dramatically increases the cost of deploying such systems in production setups and reduces their flexibility as software products. Trained with smaller data, these methods end up severely lacking robustness to various phenomena of spoken language (e.g. disfluencies), out-of-domain input, and often just have too little generalisation power to other tasks and domains. In this thesis, we address the above issues by introducing a series of methods for bootstrapping robust dialogue systems from minimal data. Firstly, we study two orthogonal approaches to dialogue: a linguistically informed model (DyLan) and a machine learning-based one (MemN2N) — from the data efficiency perspective, i.e. their potential to generalise from minimal data and robustness to natural spontaneous input. We outline the steps to obtain data-efficient solutions with either approach and proceed with the neural models for the rest of the thesis. We then introduce the core contributions of this thesis, two data-efficient models for dialogue response generation: the Dialogue Knowledge Transfer Network (DiKTNet) based on transferable latent dialogue representations, and the Generative-Retrieval Transformer (GRTr) combining response generation logic with a retrieval mechanism as the fallback. GRTr ranked first at the Dialog System Technology Challenge 8 Fast Domain Adaptation task. Next, we the problem of training robust neural models from minimal data. As such, we look at robustness to disfluencies and propose a multitask LSTM-based model for domain-general disfluency detection. We then go on to explore robustness to anomalous, or out-of-domain (OOD) input. We address this problem by (1) presenting Turn Dropout, a data-augmentation technique facilitating training for anomalous input only using in-domain data, and (2) introducing VHCN and AE-HCN, autoencoder-augmented models for efficient training with turn dropout based on the Hybrid Code Networks (HCN) model family. With all the above work addressing goal-oriented dialogue, our final contribution in this thesis focuses on social dialogue where the main objective is maintaining natural, coherent, and engaging conversation for as long as possible. We introduce a neural model for response ranking in social conversation used in Alana, the 3rd place winner in the Amazon Alexa Prize 2017 and 2018. For our model, we employ a novel technique of predicting the dialogue length as the main objective for ranking. We show that this approach matches the performance of its counterpart based on the conventional, human rating-based objective — and surpasses it given more raw dialogue transcripts, thus reducing the dependence on costly and cumbersome dialogue annotations.EPSRC project BABBLE (grant EP/M01553X/1)

    Conversation analysis for computational modelling of task-oriented dialogue

    Get PDF
    Current methods of dialogue modelling for Conversational AI (CAI) bear little resemblance to the manner in which humans organise conversational interactions. The way utterances are represented, interpreted, and generated are determined by the necessities of the chosen technique and do not resemble those used during natural conversation. In this research we propose a new method of representing task-oriented dialogue, for the purpose of computational modelling, which draws inspiration from the study of human conversational structures, Conversation Analysis (CA). Our approach unifies two well established, yet disparate, methods of dialogue representation: Dialogue Acts (DA), which provide valuable semantic and intentional information, and the Adjacency Pair (AP), which are the predominant method by which structure is defined within CA. This computationally compatible approach subsequently benefits from the strengths, whilst overcoming the weaknesses, of its components.To evaluate this thesis we first develop and evaluate a novel CA Modelling Schema (CAMS), which combines concepts of DA’s and AP’s to form AP-type labels. Thus creating a single annotation scheme that is able to capture the semantic and syntactic structure of dialogue. We additionally annotate a task-oriented corpus with our schema to create CAMS-KVRET, a first-of-its-kind DA and AP labelled dataset. Next, we conduct detailed investigations of input representation and architectural considerations in order to develop and refine several ML models capable of automatically labelling dialogue with CAMS labels. Finally, we evaluate our proposed method of dialogue representation, and accompanying models, against several dialogue modelling tasks, including next label prediction, response generation, and structure representation.With our evaluation of CAMS we show that it is both reproducible, and inherently learnable, even for novice annotators. And further, that it is most intuitively applied to task-oriented dialogues. During development of our ML classifiers we determined that, in most cases, input and architectural choices are equally applicable to DA and AP classification. We evaluated our classification models against CAMS-KVRET, and achieved high test set classification accuracy for all label components of the corpus. Additionally, we were able to show that, not only is our model capable of learning the semantic and structural aspects of both the DA and AP components, but also that AP are more predictive of future utterance labels, and thus representative of the overall dialogue structure. These finding were further supported by the results of our next-label prediction and response generation experiments. Moreover, we found AP were able to reduce the perplexity of the generative model. Finally, by using χ2 analysis to create dialogue structure graphs, we demonstrate that AP produce a more generalised and efficient method of dialogue representation. Thus, our research has shown that integrating DA with AP, into AP-type labels, captures the semantic and syntactic structure of an interaction, in a format that is independent of the domain or topic, and which benefits the computational modelling of task-oriented dialogues
    corecore