2,727 research outputs found

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    ProDial – an annotated proactive dialogue act corpus for conversational assistants using crowdsourcing

    Get PDF
    Proactive behaviour is an integral interaction concept of both human-human as well as human-computer cooperation. However, modelling proactive systems and appropriate interaction strategies are still an open quest. In this work, a parameterised and annotated dialogue corpus has been created. The corpus is based on human interactions with an autonomous agent embedded in a serious game setting. For modelling proactive dialogue behaviour, the agent was capable of selecting from four different proactive actions (None, Notification, Suggestion, Intervention) in order to serve as the user’s personal advisor in a sequential planning task. Data was collected online using crowdsourcing (308 participants) resulting in a total of 3696 system-user exchanges. Data was annotated with objective features as well as subjectively self-reported features for capturing the interplay between proactive behaviour and situational as well as user-dependent characteristics. The corpus is intended for building a user model for developing trustworthy proactive interaction strategies

    An Analysis of Mixed Initiative and Collaboration in Information-Seeking Dialogues

    Get PDF
    The ability to engage in mixed-initiative interaction is one of the core requirements for a conversational search system. How to achieve this is poorly understood. We propose a set of unsupervised metrics, termed ConversationShape, that highlights the role each of the conversation participants plays by comparing the distribution of vocabulary and utterance types. Using ConversationShape as a lens, we take a closer look at several conversational search datasets and compare them with other dialogue datasets to better understand the types of dialogue interaction they represent, either driven by the information seeker or the assistant. We discover that deviations from the ConversationShape of a human-human dialogue of the same type is predictive of the quality of a human-machine dialogue.Comment: SIGIR 2020 short conference pape

    Becoming JILDA

    Get PDF
    The difficulty in finding use-ful dialogic data to train a conversationalagent is an open issue even nowadays,when chatbots and spoken dialogue sys-tems are widely used. For this reason wedecided to build JILDA, a novel data col-lection of chat-based dialogues, producedby Italian native speakers and related to thejob-offer domain. JILDA is the first dia-logue collection related to this domain forthe Italian language. Because of its collec-tion modalities, we believe that JILDA canbe a useful resource not only for the Italianresearch community, but also for the inter-national one

    Evaluating Conversational Recommender Systems via User Simulation

    Full text link
    Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), 202
    • …
    corecore