852 research outputs found
The Dialog State Tracking Challenge Series: A Review
In a spoken dialog system, dialog state tracking refers to the task of correctly inferring the state of the conversation -- such as the user's goal -- given all of the dialog history up to that turn. Dialog state tracking is crucial to the success of a dialog system, yet until recently there were no common resources, hampering progress. The Dialog State Tracking Challenge series of 3 tasks introduced the first shared testbed and evaluation metrics for dialog state tracking, and has underpinned three key advances in dialog state tracking: the move from generative to discriminative models; the adoption of discriminative sequential techniques; and the incorporation of the speech recognition results directly into the dialog state tracker. This paper reviews this research area, covering both the challenge tasks themselves and summarizing the work they have enabled
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
Four Mode Based Dialogue Management with Modified POMDP Model
This thesis proposes a method to manage the interaction between the user and the system dynamically, through speech or text input which updates the user goals, select system actions and calculate rewards for each system response at each time-stamp. The main focus is made on the dialog manager, which decides how to continue the dialogue. We have used POMDP technique, as it maintains a belief distribution on the dialogue states based on the observations over the dialogue even in a noisy environment. Four contextual control modes are introduced in dialogue management for decision-making mechanism, and to keep track of machine behaviour for each dialogue state. The result obtained proves that our proposed framework has overcome the limitations of prior POMDP methods, and exactly understands the actual intention of the users within the available time, providing very interactive conversation between the user and the computer
Challenges and opportunities for state tracking in statistical spoken dialog systems: results from two public deployments
Abstract-Whereas traditional dialog systems operate on the top ASR hypothesis, statistical dialog systems claim to be more robust to ASR errors by maintaining a distribution over multiple hidden dialog states. Recently, these techniques have been deployed publicly for the first time, making empirical measurements possible. In this paper, we analyze two of these deployments. We find that performance was quite mixed: in some cases statistical techniques improved accuracy with respect to the top speech recognition hypothesis; in other cases, accuracy was degraded. Investigating degradations, we find the three main causes are (non-obviously) inaccurate parameter estimates, poor confidence scores, and correlations in speech recognition errors. Overall the results suggest fundamental weaknesses in the formulation as a generative model, and we suggest alternatives as future work
Modelling Multimodal Dialogues for Social Robots Using Communicative Acts
Social Robots need to communicate in a way that feels natural to humans if they are to
effectively bond with the users and provide an engaging interaction. Inline with this natural, effective communication, robots need to perceive and manage multimodal information, both as input and output, and respond accordingly. Consequently, dialogue design is a key factor in creating an engaging multimodal interaction. These dialogues need to be flexible enough to adapt to unforeseen circumstances that arise during the conversation but should also be easy to create, so the development of new applications gets simpler. In this work, we present our approach to dialogue modelling based on basic atomic interaction units called Communicative Acts. They manage basic interactions considering who has the initiative (the robot or the user), and what is his/her intention. The two possible intentions are either ask for information or give information. In addition, because we focus on one-to-one interactions, the initiative can only be taken by the robot or the user. Communicative Acts can be parametrised and combined in a hierarchical manner to fulfil the needs of the robot’s applications, and they have been equipped with built-in functionalities that are in charge of low-level communication tasks. These tasks include communication error handling, turn-taking or user disengagement. This system
has been integrated in Mini, a social robot that has been created to assist older adults with cognitive impairment. In a case of use, we demonstrate the operation of our system as well as its performance in real human–robot interactions.The research leading to these results has received funding from the projects Development of social robots
to help seniors with cognitive impairment (ROBSEN), funded by the Ministerio de Economia y Competitividad;
RoboCity2030-DIH-CM, Madrid Robotics Digital Innovation Hub, S2018/NMT-4331, funded by “Programas de
Actividades I+D en la Comunidad de Madrid” and cofunded by Structural Funds of the EU; and Robots sociales para
estimulación física, cognitiva y afectiva de mayores (ROSES) RTI2018-096338-B-I00 funded by Agencia Estatal de
Investigación (AEI), Ministerio de Ciencia, Innovación y Universidade
Recommended from our members
Data-Driven Policy Optimisation for Multi-Domain Task-Oriented Dialogue
Recent developments in machine learning along with a general shift in the public attitude towards digital personal assistants has opened new frontiers for conversational systems. Nevertheless, building data-driven multi-domain conversational agents that act optimally given a dialogue context is an open challenge. The first step towards that goal is developing an efficient way of learning a dialogue policy in new domains. Secondly, it is important to have the ability to collect and utilise human-human conversational data to bootstrap an agent's knowledge. The work presented in this thesis demonstrates how a neural dialogue manager fine-tuned with reinforcement learning presents a viable approach for learning a dialogue policy efficiently and across many domains.
The thesis starts by introducing a dialogue management module that learns through interactions to act optimally given a current context of a conversation. The current shift towards neural, parameter-rich systems does not fully address the problem of error noise coming from speech recognition or natural language understanding components. A Bayesian approach is therefore proposed to learn more robust and effective policy management in direct interactions without any prior data. By putting a distribution over model weights, the learning agent is less prone to overfit to particular dialogue realizations and a more efficient exploration policy can be therefore employed. The results show that deep reinforcement learning performs on par with non-parametric models even in a low data regime while significantly reducing the computational complexity compared with the previous state-of-the-art.
The deployment of a dialogue manager without any pre-training on human conversations is not a viable option from an industry perspective. However, the progress in building statistical systems, particularly dialogue managers, is hindered by the scale of data available. To address this fundamental obstacle, a novel data-collection pipeline entirely based on crowdsourcing without the need for hiring professional annotators is introduced. The validation of the approach results in the collection of the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully labeled collection of human-human written conversations spanning over multiple domains and topics. The proposed dataset creates a set of new benchmarks (belief tracking, policy optimisation, and response generation) significantly raising the complexity of analysed dialogues.
The collected dataset serves as a foundation for a novel reinforcement learning (RL)-based approach for training a multi-domain dialogue manager. A Multi-Action and Slot Dialogue Agent (MASDA) is proposed to combat some limitations: 1) handling complex multi-domain dialogues with multiple concurrent actions present in a single turn; and 2) lack of interpretability, which consequently impedes the use of intermediate signals (e.g., dialogue turn annotations) if such signals are available. MASDA explicitly models system acts and slots using intermediate signals, resulting in an improved task-based end-to-end framework. The model can also select concurrent actions in a single turn, thus enriching the representation of the generated responses. The proposed framework allows for RL training of dialogue task completion metrics when dealing with concurrent actions. The results demonstrate the advantages of both 1) handling concurrent actions and 2) exploiting intermediate signals: MASDA outperforms previous end-to-end frameworks while also offering improved scalability.EPSR
- …