5,040 research outputs found
Physical extracurricular activities in educational child-robot interaction
In an exploratory study on educational child-robot interaction we investigate
the effect of alternating a learning activity with an additional shared
activity. Our aim is to enhance and enrich the relationship between child and
robot by introducing "physical extracurricular activities". This enriched
relationship might ultimately influence the way the child and robot interact
with the learning material. We use qualitative measurement techniques to
evaluate the effect of the additional activity on the child-robot relationship.
We also explore how these metrics can be integrated in a highly exploratory
cumulative score for the relationship between child and robot. This cumulative
score suggests a difference in the overall child-robot relationship between
children who engage in a physical extracurricular activity with the robot, and
children who only engage in the learning activity with the robot.Comment: 5th International Symposium on New Frontiers in Human-Robot
Interaction 2016 (arXiv:1602.05456
Wizundry: A Cooperative Wizard of Oz Platform for Simulating Future Speech-based Interfaces with Multiple Wizards
Wizard of Oz (WoZ) as a prototyping method has been used to simulate
intelligent user interfaces, particularly for speech-based systems. However, as
our societies' expectations on artificial intelligence (AI) grows, the question
remains whether a single Wizard is sufficient for it to simulate smarter
systems and more complex interactions. Optimistic visions of 'what artificial
intelligence (AI) can do' places demands on WoZ platforms to simulate smarter
systems and more complex interactions. This raises the question of whether the
typical approach of employing a single Wizard is sufficient. Moreover, while
existing work has employed multiple Wizards in WoZ studies, a multi-Wizard
approach has not been systematically studied in terms of feasibility,
effectiveness, and challenges. We offer Wizundry, a real-time, web-based WoZ
platform that allows multiple Wizards to collaboratively operate a
speech-to-text based system remotely. We outline the design and technical
specifications of our open-source platform, which we iterated over two design
phases. We report on two studies in which participant-Wizards were tasked with
negotiating how to cooperatively simulate an interface that can handle natural
speech for dictation and text editing as well as other intelligent text
processing tasks. We offer qualitative findings on the Multi-Wizard experience
for Dyads and Triads of Wizards. Our findings reveal the promises and
challenges of the multi-Wizard approach and open up new research questions.Comment: 34 page
Researching interactions between humans and machines: methodological challenges
Communication scholars are increasingly concerned with interactions between humans and communicative agents. These agents, however, are considerably different from digital or social media: They are designed and perceived as life-like communication partners (i.e., as “communicative subjects”), which in turn poses distinct challenges for their empirical study. Hence, in this paper, we document, discuss, and evaluate potentials and pitfalls that typically arise for communication scholars when investigating simulated or non-simulated interactions between humans and chatbots, voice assistants, or social robots. In this paper, we focus on experiments (including pre-recorded stimuli, vignettes and the “Wizard of Oz”-technique) and field studies. Overall, this paper aims to provide guidance and support for communication scholars who want to empirically study human-machine communication. To this end, we not only compile potential challenges, but also recommend specific strategies and approaches. In addition, our reflections on current methodological challenges serve as a starting point for discussions in communication science on how meaning-making between humans and machines can be investigated in the best way possible, as illustrated in the concluding section
The Value-Sensitive Conversational Agent Co-Design Framework
Conversational agents (CAs) are gaining traction in both industry and
academia, especially with the advent of generative AI and large language
models. As these agents are used more broadly by members of the general public
and take on a number of critical use cases and social roles, it becomes
important to consider the values embedded in these systems. This consideration
includes answering questions such as 'whose values get embedded in these
agents?' and 'how do those values manifest in the agents being designed?'
Accordingly, the aim of this paper is to present the Value-Sensitive
Conversational Agent (VSCA) Framework for enabling the collaborative design
(co-design) of value-sensitive CAs with relevant stakeholders. Firstly,
requirements for co-designing value-sensitive CAs which were identified in
previous works are summarised here. Secondly, the practical framework is
presented and discussed, including its operationalisation into a design
toolkit. The framework facilitates the co-design of three artefacts that elicit
stakeholder values and have a technical utility to CA teams to guide CA
implementation, enabling the creation of value-embodied CA prototypes. Finally,
an evaluation protocol for the framework is proposed where the effects of the
framework and toolkit are explored in a design workshop setting to evaluate
both the process followed and the outcomes produced.Comment: 23 pages, 8 figure
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Building and Designing Expressive Speech Synthesis
We know there is something special about speech. Our voices are not just a means of communicating. They also give a deep impression of who we are and what we might know. They can betray our upbringing, our emotional state, our state of health. They can be used to persuade and convince, to calm and to excite. As speech systems enter the social domain they are required to interact, support and mediate our social relationships with 1) each other, 2) with digital information, and, increasingly, 3) with AI-based algorithms and processes. Socially Interactive Agents (SIAs) are at the fore- front of research and innovation in this area. There is an assumption that in the future “spoken language will provide a natural conversational interface between human beings and so-called intelligent systems.” [Moore 2017, p. 283]. A considerable amount of previous research work has tested this assumption with mixed results. However, as pointed out “voice interfaces have become notorious for fostering frustration and failure” [Nass and Brave 2005, p.6]. It is within this context, between our exceptional and intelligent human use of speech to communicate and interact with other humans, and our desire to leverage this means of communication for artificial systems, that the technology, often termed expressive speech synthesis uncomfortably falls. Uncomfortably, because it is often overshadowed by issues in interactivity and the underlying intelligence of the system which is something that emerges from the interaction of many of the components in a SIA. This is especially true of what we might term conversational speech, where decoupling how things are spoken, from when and to whom they are spoken, can seem an impossible task. This is an even greater challenge in evaluation and in characterising full systems which have made use of expressive speech. Furthermore when designing an interaction with a SIA, we must not only consider how SIAs should speak but how much, and whether they should even speak at all. These considerations cannot be ignored. Any speech synthesis that is used in the context of an artificial agent will have a perceived accent, a vocal style, an underlying emotion and an intonational model. Dimensions like accent and personality (cross speaker parameters) as well as vocal style, emotion and intonation during an interaction (within-speaker parameters) need to be built in the design of a synthetic voice. Even a default or neutral voice has to consider these same expressive speech synthesis components. Such design parameters have a strong influence on how effectively a system will interact, how it is perceived and its assumed ability to perform a task or function. To ignore these is to blindly accept a set of design decisions that ignores the complex effect speech has on the user’s successful interaction with a system. Thus expressive speech synthesis is a key design component in SIAs. This chapter explores the world of expressive speech synthesis, aiming to act as a starting point for those interested in the design, building and evaluation of such artificial speech. The debates and literature within this topic are vast and are fundamentally multidisciplinary in focus, covering a wide range of disciplines such as linguistics, pragmatics, psychology, speech and language technology, robotics and human-computer interaction (HCI), to name a few. It is not our aim to synthesise these areas but to give a scaffold and a starting point for the reader by exploring the critical dimensions and decisions they may need to consider when choosing to use expressive speech. To do this, the chapter explores the building of expressive synthesis, highlighting key decisions and parameters as well as emphasising future challenges in expressive speech research and development. Yet, before these are expanded upon we must first try and define what we actually mean by expressive speech
- …