5,739 research outputs found
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
A prosody-based vector-space model of dialog activity for information retrieval
Search in audio archives is a challenging problem. Using prosodic information to help find relevant content has been proposed as a complement to word-based retrieval, but its utility has been an open question. We propose a new way to use prosodic information in search, based on a vector-space model, where each point in time maps to a point in a vector space whose dimensions are derived from numerous prosodic features of the local context. Point pairs that are close in this vector space are frequently similar, not only in terms of the dialog activities, but also in topic. Using proximity in this space as an indicator of similarity, we built support for a query-by-example function. Searchers were happy to use this function, and it provided value on a large testset. Prosody-based retrieval did not perform as well as word-based retrieval, but the two sources of information were often non-redundant and in combination they sometimes performed better than either separately.We thank Martha Larson, Alejandro Vega, Steve Renals, Khiet Truong, Olac Fuentes, David Novick, Shreyas Karkhedkar, Luis F. Ramirez, Elizabeth E. Shriberg, Catharine Oertel, Louis-Philippe Morency, Tatsuya Kawahara, Mary Harper, and the anonymous reviewers. This work was supported in part by the National Science Foundation under Grants IIS-0914868 and IIS-1241434 and by the Spanish MEC under contract TIN2011-28169-C05-01.Ward, NG.; Werner, SD.; García-Granada, F.; Sanchís Arnal, E. (2015). A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication. 68:85-96. doi:10.1016/j.specom.2015.01.004S85966
An Analysis of Mixed Initiative and Collaboration in Information-Seeking Dialogues
The ability to engage in mixed-initiative interaction is one of the core
requirements for a conversational search system. How to achieve this is poorly
understood. We propose a set of unsupervised metrics, termed ConversationShape,
that highlights the role each of the conversation participants plays by
comparing the distribution of vocabulary and utterance types. Using
ConversationShape as a lens, we take a closer look at several conversational
search datasets and compare them with other dialogue datasets to better
understand the types of dialogue interaction they represent, either driven by
the information seeker or the assistant. We discover that deviations from the
ConversationShape of a human-human dialogue of the same type is predictive of
the quality of a human-machine dialogue.Comment: SIGIR 2020 short conference pape
Continuous Interaction with a Virtual Human
Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access
Recommended from our members
Establishment of Confidence Thresholds for Interactive Voice Response Systems Using ROC Analysis
An Interactive Voice Response (IVR) System is a platform for man-machine interaction. It is used for collecting and analyzing human voices so as to provide the desired response. The algorithm for collecting these utterances, analyzing them correctly, and providing the desired response to a caller, has been studied extensively (Allen, 1995). Whenever one calls most large organizations, their initial encounter is with a machine that will prompt the caller for their intent. Usually, such machines will give you options to choose from (Directed Dialog), or it may ask for your input (Open Dialog). This paper focuses on Open Dialog where the caller is free to indicate their intent. The problem is that the Voice Recognizer may misinterpret the caller intent; thereby providing the caller with the wrong information. This is because the recognizer has a threshold for recognizing any utterance, and traverses the part of the Call Flow that corresponds to what the engine recognizes. This threshold can be calibrated for optimal performance by undertaking a statistical analysis of a random sample of utterances, and based on the result, set the threshold that will be used to discriminate between caller utterances. The criteria that are used for establishing this threshold include, among others, Sensitivity, Accuracy and Specificity. The optimal threshold will be the one that optimizes the majority of these parameters
Automatic Article Commenting: the Task and Dataset
Comments of online articles provide extended views and improve user
engagement. Automatically making comments thus become a valuable functionality
for online forums, intelligent chatbots, etc. This paper proposes the new task
of automatic article commenting, and introduces a large-scale Chinese dataset
with millions of real comments and a human-annotated subset characterizing the
comments' varying quality. Incorporating the human bias of comment quality, we
further develop automatic metrics that generalize a broad set of popular
reference-based metrics and exhibit greatly improved correlations with human
evaluations.Comment: ACL2018; with supplements; Dataset link available in the pape
Conventions and mutual expectations — understanding sources for web genres
Genres can be understood in many different ways. They are often perceived as a primarily sociological construction, or, alternatively, as a stylostatistically observable objective characteristic of texts. The latter view is more common in the research field of information and language technology. These two views can be quite compatible and can inform each other; this present investigation discusses knowledge sources for studying genre variation and change by observing reader and author behaviour rather than performing analyses on the information objects themselves
Evaluating Conversational Recommender Systems via User Simulation
Conversational information access is an emerging research area. Currently,
human evaluation is used for end-to-end system evaluation, which is both very
time and resource intensive at scale, and thus becomes a bottleneck of
progress. As an alternative, we propose automated evaluation by means of
simulating users. Our user simulator aims to generate responses that a real
human would give by considering both individual preferences and the general
flow of interaction with the system. We evaluate our simulation approach on an
item recommendation task by comparing three existing conversational recommender
systems. We show that preference modeling and task-specific interaction models
both contribute to more realistic simulations, and can help achieve high
correlation between automatic evaluation measures and manual human assessments.Comment: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD '20), 202
- …