89,791 research outputs found
PARADISE: A Framework for Evaluating Spoken Dialogue Agents
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a
general framework for evaluating spoken dialogue agents. The framework
decouples task requirements from an agent's dialogue behaviors, supports
comparisons among dialogue strategies, enables the calculation of performance
over subdialogues and whole dialogues, specifies the relative contribution of
various factors to performance, and makes it possible to compare agents
performing different tasks by normalizing for task complexity.Comment: 10 pages, uses aclap, psfig, lingmacros, time
Combining Expression and Content in Domains for Dialog Managers
We present work in progress on abstracting dialog managers from their domain
in order to implement a dialog manager development tool which takes (among
other data) a domain description as input and delivers a new dialog manager for
the described domain as output. Thereby we will focus on two topics; firstly,
the construction of domain descriptions with description logics and secondly,
the interpretation of utterances in a given domain.Comment: 5 pages, uses conference.st
Shared task proposal: Instruction giving in virtual worlds
This paper reports on the results of the working group “Virtual Environ-ments ” at the Workshop on Shared Tasks and Comparative Evaluation for NLG. This working group discussed the use of virtual environments as a platform for NLG evaluation, and more specifically of the generation of in
Robust Dialog State Tracking for Large Ontologies
The Dialog State Tracking Challenge 4 (DSTC 4) differentiates itself from the
previous three editions as follows: the number of slot-value pairs present in
the ontology is much larger, no spoken language understanding output is given,
and utterances are labeled at the subdialog level. This paper describes a novel
dialog state tracking method designed to work robustly under these conditions,
using elaborate string matching, coreference resolution tailored for dialogs
and a few other improvements. The method can correctly identify many values
that are not explicitly present in the utterance. On the final evaluation, our
method came in first among 7 competing teams and 24 entries. The F1-score
achieved by our method was 9 and 7 percentage points higher than that of the
runner-up for the utterance-level evaluation and for the subdialog-level
evaluation, respectively.Comment: Paper accepted at IWSDS 201
- …