2,975 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    On the Usability of Spoken Dialogue Systems

    Get PDF

    Exploring User Satisfaction in a Tutorial Dialogue System

    Get PDF
    Abstract User satisfaction is a common evaluation metric in task-oriented dialogue systems, whereas tutorial dialogue systems are often evaluated in terms of student learning gain. However, user satisfaction is also important for such systems, since it may predict technology acceptance. We present a detailed satisfaction questionnaire used in evaluating the BEETLE II system (REVU-NL), and explore the underlying components of user satisfaction using factor analysis. We demonstrate interesting patterns of interaction between interpretation quality, satisfaction and the dialogue policy, highlighting the importance of more finegrained evaluation of user satisfaction

    Integrating scientific assessment of wetland areas and economic evaluation tools to develop an evaluation framework to advise wetland management

    Get PDF
    Wetland ecosystems provide society with a range of valuable ecosystem services. However, wetlands worldwide are experiencing increasing pressure from a number of sources, caused by an interrelated combination of market failure and policy intervention failure. Whatever the cause, the result is massive degradation and loss of these ecosystems and ultimately, loss of their services. To better manage wetlands the availability of sufficient relevant and reliable scientific information is required together with an assessment tool capable of providing meaningful evaluations of the consequences of management. Current assessments of wetlands are often biased towards either economic or scientific issues, with limited attempts at integration. Evaluations that neglect integration overlook the complexity of wetland ecosystems and have failed to sufficiently protect these areas. This paper reviews the literature to propose an evaluation framework which combines a scientific assessment of wetland function with cost utility analysis (CUA) to develop a meaningful trade-off matrix. A dynamic approach to wetland assessment such as the hydro geomorphologic method (HGM), developed by the US Army Corps of Engineers, offers the opportunity to consider interrelationships between ecosystem process and functions and the resulting ecosystem services. CUA facilitates the evaluation of projects where the consequences of investment or no investment are complex and difficult to value in monetary terms. The evaluation framework described in this paper has the potential to deliver an integrated wetland management tool. However, for this potential to be realised, targeted interdisciplinary research by scientists and economists is required.

    The Science and Art of Voice Interfaces

    Get PDF

    Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

    Full text link
    Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem. Unfortunately, existing automatic evaluation metrics are biased and correlate very poorly with human judgements of response quality. Yet having an accurate automatic evaluation procedure is crucial for dialogue research, as it allows rapid prototyping and testing of new models with fewer expensive human evaluations. In response to this challenge, we formulate automatic dialogue evaluation as a learning problem. We present an evaluation model (ADEM) that learns to predict human-like scores to input responses, using a new dataset of human response scores. We show that the ADEM model's predictions correlate significantly, and at a level much higher than word-overlap metrics such as BLEU, with human judgements at both the utterance and system-level. We also show that ADEM can generalize to evaluating dialogue models unseen during training, an important step for automatic dialogue evaluation.Comment: ACL 201

    Evaluating embodied conversational agents in multimodal interfaces

    Get PDF
    Based on cross-disciplinary approaches to Embodied Conversational Agents, evaluation methods for such human-computer interfaces are structured and presented. An introductory systematisation of evaluation topics from a conversational perspective is followed by an explanation of social-psychological phenomena studied in interaction with Embodied Conversational Agents, and how these can be used for evaluation purposes. Major evaluation concepts and appropriate assessment instruments – established and new ones – are presented, including questionnaires, annotations and log-files. An exemplary evaluation and guidelines provide hands-on information on planning and preparing such endeavours
    • 

    corecore