252,891 research outputs found

    A Review of Evaluation Techniques for Social Dialogue Systems

    Full text link
    In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.Comment: 2 page

    A study of the problems of man-computer dialogues for naive users

    Get PDF
    The success of an interactive computing facility will depend, to a large extent, upon the effectiveness of the man-computer dialogue which it supports. Comparatively little work has been directed towards the design of effective dialogues for situations in which the 'man' is a 'naive' user i.e. a person without training or experience of computer procedures. Thus the aim of this project has been to produce a series of specialised guidelines for designers of dialogues for naive users. An examination of the literature reveals that published dialogue guidelines tend to be of a general purpose nature and therefore cannot be applied directly to specific situations. Furthermore, as each set of recommendations is based upon a limited range of experience, authors opinions appear to contradict or be in need of further qualification. At a practical level, a survey of computer games, intended to be self-explanatory and therefore suitable for naive users, bears out the widely held feeling that the dialogue interface is often a poorly considered aspect of interactive program writing. Pilot studies highlight the need for experimental work into man-computer dialogues to be carried out under conditions conforming as closely as possible to a 'real world' environment. The main study focuses upon the general public as users of a local information system developed and installed in Leicester's Information Bureau. Monitoring the public's usage of and reactions to the system has enabled a series of dialogue guidelines for public information systems to be produced. A review of the literature provides supplementary recommendations. The influence of dialogue recommendations on the software writing community is considered. Less than half of a sample of application programmers are found to refer to material of this kind. Follow up interviews indicate that the concept of a dialogue guideline is too narrow and should be broadened to cover all types of dialogue design information. This would render it more applicable to differing design situations. For designers who do not refer to published material, it is suggested that .sound principles can be communicated via trained experts and the use of library subroutines supporting dialogue creation. An example is considered of a routine to process textual inputs. A number of paths for future research are described concerning the development of experimental methodology suitable for testing man-computer dialogues, an evaluation of the proposed strategy for communicating dialogue design principles and the application of new input/output techniques to public information systems. It is also suggested that the likely social consequences of computerised information facilities should be determined

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Responsible research and innovation in science education: insights from evaluating the impact of using digital media and arts-based methods on RRI values

    Get PDF
    The European Commission policy approach of Responsible Research and Innovation (RRI) is gaining momentum in European research planning and development as a strategy to align scientific and technological progress with socially desirable and acceptable ends. One of the RRI agendas is science education, aiming to foster future generations' acquisition of skills and values needed to engage in society responsibly. To this end, it is argued that RRI-based science education can benefit from more interdisciplinary methods such as those based on arts and digital technologies. However, the evidence existing on the impact of science education activities using digital media and arts-based methods on RRI values remains underexplored. This article comparatively reviews previous evidence on the evaluation of these activities, from primary to higher education, to examine whether and how RRI-related learning outcomes are evaluated and how these activities impact on students' learning. Forty academic publications were selected and its content analysed according to five RRI values: creative and critical thinking, engagement, inclusiveness, gender equality and integration of ethical issues. When evaluating the impact of digital and arts-based methods in science education activities, creative and critical thinking, engagement and partly inclusiveness are the RRI values mainly addressed. In contrast, gender equality and ethics integration are neglected. Digital-based methods seem to be more focused on students' questioning and inquiry skills, whereas those using arts often examine imagination, curiosity and autonomy. Differences in the evaluation focus between studies on digital media and those on arts partly explain differences in their impact on RRI values, but also result in non-documented outcomes and undermine their potential. Further developments in interdisciplinary approaches to science education following the RRI policy agenda should reinforce the design of the activities as well as procedural aspects of the evaluation research
    • …
    corecore