13 research outputs found
Tracking of enriched dialog states for flexible conversational information access
Dialog state tracking (DST) is a crucial component in a task-oriented dialog
system for conversational information access. A common practice in current
dialog systems is to define the dialog state by a set of slot-value pairs. Such
representation of dialog states and the slot-filling based DST have been widely
employed, but suffer from three drawbacks. (1) The dialog state can contain
only a single value for a slot, and (2) can contain only users' affirmative
preference over the values for a slot. (3) Current task-based dialog systems
mainly focus on the searching task, while the enquiring task is also very
common in practice. The above observations motivate us to enrich current
representation of dialog states and collect a brand new dialog dataset about
movies, based upon which we build a new DST, called enriched DST (EDST), for
flexible accessing movie information. The EDST supports the searching task, the
enquiring task and their mixed task. We show that the new EDST method not only
achieves good results on Iqiyi dataset, but also outperforms other
state-of-the-art DST methods on the traditional dialog datasets, WOZ2.0 and
DSTC2.Comment: 5 pages, 2 figures, accepted by ICASSP201
Recommended from our members
Data-Driven Policy Optimisation for Multi-Domain Task-Oriented Dialogue
Recent developments in machine learning along with a general shift in the public attitude towards digital personal assistants has opened new frontiers for conversational systems. Nevertheless, building data-driven multi-domain conversational agents that act optimally given a dialogue context is an open challenge. The first step towards that goal is developing an efficient way of learning a dialogue policy in new domains. Secondly, it is important to have the ability to collect and utilise human-human conversational data to bootstrap an agent's knowledge. The work presented in this thesis demonstrates how a neural dialogue manager fine-tuned with reinforcement learning presents a viable approach for learning a dialogue policy efficiently and across many domains.
The thesis starts by introducing a dialogue management module that learns through interactions to act optimally given a current context of a conversation. The current shift towards neural, parameter-rich systems does not fully address the problem of error noise coming from speech recognition or natural language understanding components. A Bayesian approach is therefore proposed to learn more robust and effective policy management in direct interactions without any prior data. By putting a distribution over model weights, the learning agent is less prone to overfit to particular dialogue realizations and a more efficient exploration policy can be therefore employed. The results show that deep reinforcement learning performs on par with non-parametric models even in a low data regime while significantly reducing the computational complexity compared with the previous state-of-the-art.
The deployment of a dialogue manager without any pre-training on human conversations is not a viable option from an industry perspective. However, the progress in building statistical systems, particularly dialogue managers, is hindered by the scale of data available. To address this fundamental obstacle, a novel data-collection pipeline entirely based on crowdsourcing without the need for hiring professional annotators is introduced. The validation of the approach results in the collection of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully labeled collection of human-human written conversations spanning over multiple domains and topics. The proposed dataset creates a set of new benchmarks (belief tracking, policy optimisation, and response generation) significantly raising the complexity of analysed dialogues.
The collected dataset serves as a foundation for a novel reinforcement learning (RL)-based approach for training a multi-domain dialogue manager. A Multi-Action and Slot Dialogue Agent (MASDA) is proposed to combat some limitations: 1) handling complex multi-domain dialogues with multiple concurrent actions present in a single turn; and 2) lack of interpretability, which consequently impedes the use of intermediate signals (e.g., dialogue turn annotations) if such signals are available. MASDA explicitly models system acts and slots using intermediate signals, resulting in an improved task-based end-to-end framework. The model can also select concurrent actions in a single turn, thus enriching the representation of the generated responses. The proposed framework allows for RL training of dialogue task completion metrics when dealing with concurrent actions. The results demonstrate the advantages of both 1) handling concurrent actions and 2) exploiting intermediate signals: MASDA outperforms previous end-to-end frameworks while also offering improved scalability.EPSR
Structured Dialogue State Management for Task-Oriented Dialogue Systems
Human-machine conversational agents have developed at a rapid pace in recent years, bolstered through the application of advanced technologies such as deep learning. Today, dialogue systems are useful in assisting users in various activities, especially task-oriented dialogue systems in specific dialogue domains. However, they continue to be limited in many ways. Arguably the biggest challenge lies in the complexity of natural language and interpersonal communication, and the lack of human context and knowledge available to these systems. This leads to the question of whether dialogue systems, and in particular task-oriented dialogue systems, can be enhanced to leverage various language properties. This work focuses on the semantic structural properties of language in task-oriented dialogue systems. These structural properties are manifest by variable dependencies in dialogue domains; and the study of and accounting for these variables and their interdependencies is the main objective of this research.
Contemporary task-oriented dialogue systems are typically developed with a multiple component architecture, where each component is responsible for a specific process in the conversational interaction. It is commonly accepted that the ability to understand user input in a conversational context, a responsibility generally assigned to the dialogue state tracking component, contributes a huge part to the overall performance of dialogue systems. The output of the dialogue state tracking component, so-called dialogue states, are a representation of the aspects of a dialogue relevant to the completion of a task up to that point, and should also capture the task structural properties of natural language. Here, in a dialogue context dialogue state variables are expressed through dialogue slots and slot values, hence the dialogue state variable dependencies are expressed as the dependencies between dialogue slots and their values. Incorporating slot dependencies in the dialogue state tracking process is herein hypothesised to enhance the accuracy of postulated dialogue states, and subsequently potentially improve the performance of task-oriented dialogue systems.
Given this overall goal and approach to the improvement of dialogue systems, the work in this dissertation can be broken down into two related contributions: (i) a study of structural properties in dialogue states; and (ii) the investigation of novel modelling approaches to capture slot dependencies in dialogue domains.
The analysis of language\u27s structural properties was conducted with a corpus-based study to investigate whether variable dependencies, i.e., slot dependencies when using dialogue system terminology, exist in dialogue domains, and if yes, to what extent do these dependencies affect the dialogue state tracking process. A number of public dialogue corpora were chosen for analysis with a collection of statistical methods being applied to their analysis.
Deep learning architectures have been shown in various works to be an effective method to model conversations and different types of machine learning challenges. In this research, in order to account for slot dependencies, a number of deep learning-based models were experimented with for the dialogue state tracking task. In particular, a multi-task learning system was developed to study the leveraging of common features and shared knowledge in the training of dialogue state tracking subtasks such as tracking different slots, hence investigating the associations between these slots. Beyond that, a structured prediction method, based on energy-based learning, was also applied to account for explicit dialogue slot dependencies.
The study results show promising directions for solving the dialogue state tracking challenge for task-oriented dialogue systems. By accounting for slot dependencies in dialogue domains, dialogue states were produced more accurately when benchmarked against comparative modelling methods that do not take advantage of the same principle. Furthermore, the structured prediction method is applicable to various state-of-the-art modelling approaches for further study.
In the long term, the study of dialogue state slot dependencies can potentially be expanded to a wider range of conversational aspects such as personality, preferences, and modalities, as well as user intents