2,727 research outputs found
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
ProDial – an annotated proactive dialogue act corpus for conversational assistants using crowdsourcing
Proactive behaviour is an integral interaction concept of both human-human as well as human-computer cooperation. However, modelling proactive systems and appropriate interaction strategies are still an open quest. In this work, a parameterised and annotated dialogue corpus has been created. The corpus is based on human interactions with an autonomous agent embedded in a serious game setting. For modelling proactive dialogue behaviour, the agent was capable of selecting from four different proactive actions (None, Notification, Suggestion, Intervention) in order to serve as the user’s personal advisor in a sequential planning task. Data was collected online using crowdsourcing (308 participants) resulting in a total of 3696 system-user exchanges. Data was annotated with objective features as well as subjectively self-reported features for capturing the interplay between proactive behaviour and situational as well as user-dependent characteristics. The corpus is intended for building a user model for developing trustworthy proactive interaction strategies
An Analysis of Mixed Initiative and Collaboration in Information-Seeking Dialogues
The ability to engage in mixed-initiative interaction is one of the core
requirements for a conversational search system. How to achieve this is poorly
understood. We propose a set of unsupervised metrics, termed ConversationShape,
that highlights the role each of the conversation participants plays by
comparing the distribution of vocabulary and utterance types. Using
ConversationShape as a lens, we take a closer look at several conversational
search datasets and compare them with other dialogue datasets to better
understand the types of dialogue interaction they represent, either driven by
the information seeker or the assistant. We discover that deviations from the
ConversationShape of a human-human dialogue of the same type is predictive of
the quality of a human-machine dialogue.Comment: SIGIR 2020 short conference pape
Becoming JILDA
The difficulty in finding use-ful dialogic data to train a conversationalagent is an open issue even nowadays,when chatbots and spoken dialogue sys-tems are widely used. For this reason wedecided to build JILDA, a novel data col-lection of chat-based dialogues, producedby Italian native speakers and related to thejob-offer domain. JILDA is the first dia-logue collection related to this domain forthe Italian language. Because of its collec-tion modalities, we believe that JILDA canbe a useful resource not only for the Italianresearch community, but also for the inter-national one
Evaluating Conversational Recommender Systems via User Simulation
Conversational information access is an emerging research area. Currently,
human evaluation is used for end-to-end system evaluation, which is both very
time and resource intensive at scale, and thus becomes a bottleneck of
progress. As an alternative, we propose automated evaluation by means of
simulating users. Our user simulator aims to generate responses that a real
human would give by considering both individual preferences and the general
flow of interaction with the system. We evaluate our simulation approach on an
item recommendation task by comparing three existing conversational recommender
systems. We show that preference modeling and task-specific interaction models
both contribute to more realistic simulations, and can help achieve high
correlation between automatic evaluation measures and manual human assessments.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD '20), 202
- …