8,637 research outputs found
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models
Neural conversational models require substantial amounts of dialogue data for
their parameter estimation and are therefore usually learned on large corpora
such as chat forums or movie subtitles. These corpora are, however, often
challenging to work with, notably due to their frequent lack of turn
segmentation and the presence of multiple references external to the dialogue
itself. This paper shows that these challenges can be mitigated by adding a
weighting model into the architecture. The weighting model, which is itself
estimated from dialogue data, associates each training example to a numerical
weight that reflects its intrinsic quality for dialogue modelling. At training
time, these sample weights are included into the empirical loss to be
minimised. Evaluation results on retrieval-based models trained on movie and TV
subtitles demonstrate that the inclusion of such a weighting model improves the
model performance on unsupervised metrics.Comment: Accepted to SIGDIAL 201
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Evaluating Conversational Recommender Systems: A Landscape of Research
Conversational recommender systems aim to interactively support online users
in their information search and decision-making processes in an intuitive way.
With the latest advances in voice-controlled devices, natural language
processing, and AI in general, such systems received increased attention in
recent years. Technically, conversational recommenders are usually complex
multi-component applications and often consist of multiple machine learning
models and a natural language user interface. Evaluating such a complex system
in a holistic way can therefore be challenging, as it requires (i) the
assessment of the quality of the different learning components, and (ii) the
quality perception of the system as a whole by users. Thus, a mixed methods
approach is often required, which may combine objective (computational) and
subjective (perception-oriented) evaluation techniques. In this paper, we
review common evaluation approaches for conversational recommender systems,
identify possible limitations, and outline future directions towards more
holistic evaluation practices
Understanding Anthropological Understanding: for a merological anthropology
In this paper I argue for a merological anthropology in which ideas of ‘partiality’ and ‘practical adequacy’ provide a way out of the impasse of relativism which is implied by post-modernism and the related abandonment of a concern with ‘truth’. Ideas such as ‘aptness’ and ‘faithfulness’ enable us to re-establish empirical foundations without having to espouse a simple realism which has been rightly criticised. Ideas taken from ethnomethodology, particularly the way we bootstrap from ‘practical adequacy’ to ‘warrants for confidence’ point to a merological anthropology in which we recognize that we do not and cannot know everything, but that we can have reasons for being confident in the little we know
Scoping a vision for formative e-assessment: a project report for JISC
Assessment is an integral part of teaching and learning. If the relationship between teaching and learning were causal, i. e. if students always mastered the intended learning outcomes of a particular sequence of instruction, assessment would be superfluous. Experience and research suggest this is not the case: what is learnt can often be quite different from what is taught. Formative assessment is motivated by a concern with the elicitation of relevant information about student understanding and / or achievement, its interpretation and an exploration of how it can lead to actions that result in better learning. In the context of a policy drive towards technology-enhanced approaches to teaching and learning, the question of the role of digital technologies is key and it is the latter on which this project particularly focuses. The project and its deliverables have been informed by recent and relevant literature, in particular recent work by Black andIn this work, they put forward a framework which suggests that assessment for learning their term for formative assessment can be conceptualised as consisting of a number of aspects and five keystrategies. The key aspects revolve around the where the learner is going, where the learner is right now and how she can get there and examines the role played by the teacher, peers and the learner. Language: English Keywords: assessments, case studies, design patterns, e-assessmen
A Computational Theory of the Use-Mention Distinction in Natural Language
To understand the language we use, we sometimes must turn language on itself, and we do this through an understanding of the use-mention distinction. In particular, we are able to recognize mentioned language: that is, tokens (e.g., words, phrases, sentences, letters, symbols, sounds) produced to draw attention to linguistic properties that they possess. Evidence suggests that humans frequently employ the use-mention distinction, and we would be severely handicapped without it; mentioned language frequently occurs for the introduction of new words, attribution of statements, explanation of meaning, and assignment of names. Moreover, just as we benefit from mutual recognition of the use-mention distinction, the potential exists for us to benefit from language technologies that recognize it as well. With a better understanding of the use-mention distinction, applications can be built to extract valuable information from mentioned language, leading to better language learning materials, precise dictionary building tools, and highly adaptive computer dialogue systems.
This dissertation presents the first computational study of how the use-mention distinction occurs in natural language, with a focus on occurrences of mentioned language. Three specific contributions are made. The first is a framework for identifying and analyzing instances of mentioned language, in an effort to reconcile elements of previous theoretical work for practical use. Definitions for mentioned language, metalanguage, and quotation have been formulated, and a procedural rubric has been constructed for labeling instances of mentioned language. The second is a sequence of three labeled corpora of mentioned language, containing delineated instances of the phenomenon. The corpora illustrate the variety of mentioned language, and they enable analysis of how the phenomenon relates to sentence structure. Using these corpora, inter-annotator agreement studies have quantified the concurrence of human readers in labeling the phenomenon. The third contribution is a method for identifying common forms of mentioned language in text, using patterns in metalanguage and sentence structure. Although the full breadth of the phenomenon is likely to elude computational tools for the foreseeable future, some specific, common rules for detecting and delineating mentioned language have been shown to perform well
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
- …