21 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Survey on evaluation methods for dialogue

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Using Dialogue Acts in dialogue strategy learning: optimising repair strategies

    Get PDF
    Institute for Communicating and Collaborative SystemsA Spoken Dialogue System's (SDS's) dialogue strategy specifies which action it will take depending on its representation of the current dialogue context. Designing it by hand involves anticipating how users will interact with the system, and/or repeated testing and refining, and so can be a difficult, time-consuming task. Since SDSs inevitably make understanding errors, a particularly important issue is how to design ``repair strategies'', the parts of the dialogue strategy which attempt to get the dialogue ``back-on-track'' following these errors. To try to produce better dialogue strategies with less time and effort, previous researchers have modelled a dialogue strategy as a sequential decision problem called a Markov Decision Process (MDP), and then applied Reinforcement Learning (RL) algorithms to example training dialogues to generate dialogue strategies automatically. More recent research has used training dialogues conducted with simulated rather than real users and learned which action to take in all dialogue contexts, (a ``full'' as opposed to a ``partial'' dialogue strategy) - simulated users allow more training dialogues to be generated, and the exploration of new dialogue contexts not present in an original dataset. As yet however, limited insight has been provided as to which dialogue contextual features are important to include in the MDP and why. Indeed, a full dialogue strategy has not been learned from training dialogues with a realistic probabilistic user simulation derived from real user data, and then shown to work well with real users. This thesis investigates the value of adding new linguistically-motivated contextual features to the MDP when using RL to learn full dialogue strategies for SDSs. These new features are recent Dialogue Acts (DAs). DAs indicate the role or intention of an utterance in a dialogue e.g. ``provide-information'', an utterance being a complete unit of a speaker's speech, often bounded by silence. An accurate probabilistic user simulation learned from real user data is used for generating training dialogues, and the recent DAs are shown to improve performance in testing in simulation and with real users. With real users, performance is also better than other competing learned and hand-crafted strategies. Analysis of the strategies, and further simulation experiments show how the DAs improve performance through better repair strategies. The main findings are expected to apply to SDSs in general - indeed our strategies are learned and tested on real users in different domains, (flight-booking versus tourist information). Comparisons are also made to recent research which focuses on handling understanding errors in SDSs, but which does not use RL or user simulations

    Computational modeling of turn-taking dynamics in spoken conversations

    Get PDF
    The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

    Hybrid discourse modeling and summarization for a speech-to-speech translation system

    Get PDF
    The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization - confabulation - is defined that provides a more precise understanding of the performance of complex natural language processing systems.Die vorliegende Arbeit behandelt hauptsächlich zwei Themen, die für das VerbMobil-System, ein Übersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. Für die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation für Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf fünf Ebenen in einer sprachunabhängigen Repräsentation darstellt. Neben dem für die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur Protokollgenerierungsfunktionalität vorgestellt. Zusätzlich zu "precision" und "recall" wird ein neues Maß - Konfabulation (Engl.: "confabulation") - vorgestellt, das eine präzisere Charakterisierung der Qualität eines komplexen Sprachverarbeitungssystems ermöglicht
    corecore