22,008 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

    Full text link
    We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus -- a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.Comment: 10 pages, RoboNLP Workshop from ACL Conferenc

    The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

    Full text link
    We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as " burchak " for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17

    Harmonised Principles for Public Participation in Quality Assurance of Integrated Water Resources Modelling

    Get PDF
    The main purpose of public participation in integrated water resources modelling is to improve decision-making by ensuring that decisions are soundly based on shared knowledge, experience and scientific evidence. The present paper describes stakeholder involvement in the modelling process. The point of departure is the guidelines for quality assurance for `scientific` water resources modelling developed under the EU research project HarmoniQuA, which has developed a computer based Modelling Support Tool (MoST) to provide a user-friendly guidance and a quality assurance framework that aim for enhancing the credibility of river basin modelling. MoST prescribes interaction, which is a form of participation above consultation but below engagement of stakeholders and the public in the early phases of the modelling cycle and under review tasks throughout the process. MoST is a flexible tool which supports different types of users and facilitates interaction between modeller, manager and stakeholders. The perspective of using MoST for engagement of stakeholders e.g. higher level participation throughout the modelling process as part of integrated water resource management is evaluate

    User Simulation in Dialogue Systems using Inverse Reinforcement Learning

    No full text
    International audienceSpoken Dialogue Systems (SDS) are man-machine interfaces which use natural language as the medium of interaction. Dialogue corpora collection for the purpose of training and evaluating dialogue systems is an expensive process. User simulators aim at simulating human users in order to generate synthetic data. Existing methods for user simulation mainly focus on generating data with the same statistical consistency as in some reference dialogue corpus. This paper outlines a novel approach for user simulation based on Inverse Reinforcement Learning (IRL). The task of building the user simulator is perceived as a task of imitation learning

    Apprentissage par Renforcement Inverse pour la Simulation d'Utilisateurs dans les Systèmes de Dialogue

    No full text
    National audienceLes systèmes de dialogue sont des interfaces homme-machine qui utilisent le language naturel comme medium d'interaction. La simulation d'utilisateurs a pour objectif de simuler le comportement d'un utilisateur humain afin de générer artificiellement des dialogues. Cette étape est souvent essentielle dans la mesure où collecter et annoter des corpus de dialogues est un processus coûteux, bien que nécessaire à l'utilisation de méthodes d'apprentissage artificiel (tel l'apprentissage par renforcement qui peut être utilisé pour apprendre la politique du gestionnaire de dialogues). Les simulateurs d'utilisateurs existants cherchent essentiellement à produire des comportements d'utilisateurs qui soient statistiquement consistants avec le corpus de dialogues. La contribution de cet article est d'utiliser l'apprentissage par renforcement inverse pour bâtir un nouveau simulateur d'utilisateur. Cette nouvelle approche est illustrée par la simulation du comportement d'un modèle d'utilisateur (artificiel) sur un problème à trois attributs pour un système d'information touristiques. Le comportement du nouveau simulateur d'utilisateur est évalué selon plusieurs métriques (de l'interaction au dialogue)

    Learning and Games

    Get PDF
    Part of the Volume on the Ecology of Games: Connecting Youth, Games, and Learning In this chapter, I argue that good video games recruit good learning and that a game's design is inherently connected to designing good learning for players. I start with a perspective on learning now common in the Learning Sciences that argues that people primarily think and learn through experiences they have had, not through abstract calculations and generalizations. People store these experiences in memory -- and human long-term memory is now viewed as nearly limitless -- and use them to run simulations in their minds to prepare for problem solving in new situations. These simulations help them to form hypotheses about how to proceed in the new situation based on past experiences. The chapter also discusses the conditions experience must meet if it is to be optimal for learning and shows how good video games can deliver such optimal learning experiences. Some of the issues covered include: identity and learning; models and model-based thinking; the control of avatars and "empathy for a complex system"; distributed intelligence and cross-functional teams for learning; motivation, and ownership; emotion in learning; and situated meaning, that is, the ways in which games represent verbal meaning through images, actions, and dialogue, not just other words and definitions

    Evaluation of a hierarchical reinforcement learning spoken dialogue system

    Get PDF
    We describe an evaluation of spoken dialogue strategies designed using hierarchical reinforcement learning agents. The dialogue strategies were learnt in a simulated environment and tested in a laboratory setting with 32 users. These dialogues were used to evaluate three types of machine dialogue behaviour: hand-coded, fully-learnt and semi-learnt. These experiments also served to evaluate the realism of simulated dialogues using two proposed metrics contrasted with ‘Precision-Recall’. The learnt dialogue behaviours used the Semi-Markov Decision Process (SMDP) model, and we report the first evaluation of this model in a realistic conversational environment. Experimental results in the travel planning domain provide evidence to support the following claims: (a) hierarchical semi-learnt dialogue agents are a better alternative (with higher overall performance) than deterministic or fully-learnt behaviour; (b) spoken dialogue strategies learnt with highly coherent user behaviour and conservative recognition error rates (keyword error rate of 20%) can outperform a reasonable hand-coded strategy; and (c) hierarchical reinforcement learning dialogue agents are feasible and promising for the (semi) automatic design of optimized dialogue behaviours in larger-scale systems
    • …
    corecore