252,891 research outputs found
A Review of Evaluation Techniques for Social Dialogue Systems
In contrast with goal-oriented dialogue, social dialogue has no clear measure
of task success. Consequently, evaluation of these systems is notoriously hard.
In this paper, we review current evaluation methods, focusing on automatic
metrics. We conclude that turn-based metrics often ignore the context and do
not account for the fact that several replies are valid, while end-of-dialogue
rewards are mainly hand-crafted. Both lack grounding in human perceptions.Comment: 2 page
A study of the problems of man-computer dialogues for naive users
The success of an interactive computing facility will depend, to a large extent, upon the effectiveness of the man-computer
dialogue which it supports. Comparatively little work has been directed towards the design of effective dialogues for situations
in which the 'man' is a 'naive' user i.e. a person without training or experience of computer procedures. Thus the aim of this project has been to produce a series of specialised guidelines for designers of dialogues for naive users.
An examination of the literature reveals that published dialogue guidelines tend to be of a general purpose nature and therefore cannot be applied directly to specific situations. Furthermore, as each set of recommendations is based upon a limited range of experience, authors opinions appear to contradict or be in need of further qualification.
At a practical level, a survey of computer games, intended to be self-explanatory and therefore suitable for naive users, bears out
the widely held feeling that the dialogue interface is often a poorly considered aspect of interactive program writing.
Pilot studies highlight the need for experimental work into man-computer dialogues to be carried out under conditions conforming
as closely as possible to a 'real world' environment.
The main study focuses upon the general public as users of a local information system developed and installed in Leicester's
Information Bureau. Monitoring the public's usage of and reactions to the system has enabled a series of dialogue guidelines for public information systems to be produced. A review of the literature provides supplementary recommendations.
The influence of dialogue recommendations on the software writing community is considered. Less than half of a sample of
application programmers are found to refer to material of this kind. Follow up interviews indicate that the concept of a dialogue guideline is too narrow and should be broadened to cover all types of dialogue design information. This would render it more applicable to differing design situations. For designers who do not refer to published material, it is suggested that .sound principles can be communicated via trained experts and the use of library subroutines supporting dialogue creation. An example is considered of a routine to process textual inputs.
A number of paths for future research are described concerning the development of experimental methodology suitable for
testing man-computer dialogues, an evaluation of the proposed strategy for communicating dialogue design principles and the application of new input/output techniques to public information systems. It is also suggested that the likely social consequences of computerised information facilities should be determined
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Responsible research and innovation in science education: insights from evaluating the impact of using digital media and arts-based methods on RRI values
The European Commission policy approach of Responsible Research and Innovation (RRI) is gaining momentum in European research planning and development as a strategy to align scientific and technological progress with socially desirable and acceptable ends. One of the RRI agendas is science education, aiming to foster future generations' acquisition of skills and values needed to engage in society responsibly. To this end, it is argued that RRI-based science education can benefit from more interdisciplinary methods such as those based on arts and digital technologies. However, the evidence existing on the impact of science education activities using digital media and arts-based methods on RRI values remains underexplored. This article comparatively reviews previous evidence on the evaluation of these activities, from primary to higher education, to examine whether and how RRI-related learning outcomes are evaluated and how these activities impact on students' learning. Forty academic publications were selected and its content analysed according to five RRI values: creative and critical thinking, engagement, inclusiveness, gender equality and integration of ethical issues. When evaluating the impact of digital and arts-based methods in science education activities, creative and critical thinking, engagement and partly inclusiveness are the RRI values mainly addressed. In contrast, gender equality and ethics integration are neglected. Digital-based methods seem to be more focused on students' questioning and inquiry skills, whereas those using arts often examine imagination, curiosity and autonomy. Differences in the evaluation focus between studies on digital media and those on arts partly explain differences in their impact on RRI values, but also result in non-documented outcomes and undermine their potential. Further developments in interdisciplinary approaches to science education following the RRI policy agenda should reinforce the design of the activities as well as procedural aspects of the evaluation research
- …