Search CORE

4,510 research outputs found

Joint attention in spoken human-robot interaction

Author: Staudte Maria
Publication venue: Fakultät 4 - Philosophische Fakultät II. Fachrichtung 4.7 - Allgemeine Linguistik
Publication date: 01/01/2010
Field of study

Gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream - speakers look at entities before mentioning them (Griffin, 2001; Meyer et al., 1998), while listeners look at objects as they are mentioned (Tanenhaus et al., 1995). Thus, a speaker\u27s gaze to mentioned objects in a shared environment provides the listener with a cue to the speaker\u27s focus of visual attention and potentially to an intended referent. The coordination of interlocutor\u27s visual attention, in order to learn about the partner\u27s goals and intentions, has been called joint attention (Moore and Dunham, 1995; Emery, 2000). By revealing the speakers communicative intentions, such attentional cues thus complement spoken language, facilitating grounding and sometimes disambiguating references (Hanna and Brennan, 2007). Previous research has shown that people readily attribute intentional states to non-humans as well, like animals, computers, or robots (Nass and Moon, 2000). Assuming that people indeed ascribe intentional states to a robot, joint attention may be a relevant component of human-robot interaction as well. It was the objective of this thesis to investigate the hypothesis that people jointly attend to objects looked at by a speaking robot and that human listeners use this visual information to infer the robot\u27s communicative intentions. Five eye-tracking experiments in a spoken human-robot interaction setting were conducted and provide supporting evidence for this hypothesis. In these experiments, participants\u27 eye movements and responses were recorded while they viewed videos of a robot that described and looked at objects in a scene. The congruency and alignment of robot gaze and the spoken references were manipulated in order to establish the relevance of such gaze cues for utterance comprehension in participants. Results suggest that people follow robot gaze to objects and infer referential intentions from it, causing both facilitation and disruption of reference resolution, depending on the match or mismatch between inferred intentions and the actual utterance. Specifically, we have shown in Experiments 1-3 that people assign attentional and intentional states to a robot, interpreting its gaze as cue to intended referents. This interpretation determined how people grounded spoken references in the scene, thus, influencing overall utterance comprehension as well as the production of verbal corrections in response to false robot utterances. In Experiments 4 and 5, we further manipulated temporal synchronization and linear alignment of robot gaze and speech and found that substantial temporal shifts of gaze relative to speech did not affect utterance comprehension while the order of visual and spoken referential cues did. These results show that people interpret gaze cues in the order they occur in and expect the retrieved referential intentions to be realized accordingly. Thus, our findings converge to the result that people establish joint attention with a robot.Die Blickrichtung des Menschen ist eng mit Sprachproduktion und Sprachverstehen verknüpft: So schaut ein Sprecher in der Regel auf ein Objekt kurz bevor er es nennt, während der Blick des Hörers sich beim Verstehen des Objektnamens darauf richtet (Griffin, 2001; Meyer et al., 1998; Tanenhaus et al., 1995). Die Blickrichtung des Sprechers gibt dem Hörer also Aufschluss darüber, wohin die Aufmerksamkeit des Sprechers gerade gerichtet ist und worüber möglicherweise als nächstes gesprochen wird. Wenn jemand dem Blick seines Gegenübers folgt, um herauszufinden was dieser fuer Ziele oder Absichten hat, spricht man von gemeinsamer Aufmerksamkeit (Joint Attention, bzw. Shared Attention, wenn beide Gesprächspartner ihre Aufmerksamkeit bewusst koordinieren, Moore and Dunham, 1995; Emery, 2000). Der Blickrichtung des Sprechers zu folgen, kann demnach nützlich sein, da sie häufig seine Absichten verrät. Sie kann sogar das Sprachverstehen erleichtern, indem zum Beispiel referenzierende Ausdruecke mit Hilfe solcher visuellen Informationen disambiguiert werden (Hanna and Brennan, 2007). Darüber hinaus wurde in der Vergangenheit gezeigt, dass Menschen häufig nicht nur Menschen, sondern auch Tieren und Maschinen, wie zum Bespiel Robotern, Ab- sichten oder Charakterzüge zuschreiben (Nass and Moon, 2000). Wenn Robotern tatsächlich die eigentlich menschliche Fähigkeit, Ziele oder Absichten zu haben, zugeordnet wird, dann ist davon auszugehen, dass gemeinsame Aufmerksamkeit auch einen wichtigen Bestandteil der Kommunikation zwischen Mensch und Roboter darstellt. Ziel dieser Dissertation war es, die Hypothese zu untersuchen, dass Menschen versuchen Aufmerksamkeit mit Robotern zu teilen, um zu erkennen, was ein Roboter beabsichtigt zu sagen oder zu tun. Wir stellen insgesamt fünf Experimente vor, die diese Hypothese unterstützen. In diesen Experimenten wurden die Augenbewegungen und Antworten, beziehungsweise Reaktionszeiten, von Versuchspersonen aufgezeichnet, während letztere sich Videos anschauten. Die Videos zeigten einen Roboter, welcher eine Anordnung von Objekten beschrieb, während er seine Kamera auf das ein oder andere Objekt richtete, um Blickrichtung zu simulieren. Manipuliert wurde die Kongruenz der Verweise auf Objekte durch Blickrichtung und Objektnamen, sowie die Abfolge solcher Verweise. Folglich konnten der Informationsgehalt und die relative Gewichtung von Blickrichtung fuer das Sprachverstehen bestimmt werden. Unsere Ergebnisse belegen, dass Menschen tatsächlich dem Roboterblick folgen und ihn ähnlich interpretieren wie die Blickrichtung anderer Menschen, d.h. Versuchspersonen leiteten aus der Blickrichtung des Roboters ab, was dessen vermeintliche (sprachliche) Absichten waren. Insbesondere zeigen die Experimente 1-3, dass Versuchspersonen die Blickrichtung des Roboters als Hinweis auf nachfolgende, referenzierende Ausdrücke verstehen und dementsprechend die Äußerung des Roboter speziell auf jene angeschauten Objekte beziehen. Dies führt zu verkürzten Reaktionszeiten wenn die Verweise auf Objekte durch Blickrichtung und Objektnamen übereinstimmen, während widersprüchliche Verweise zu verlängerten Reaktionszeiten führen. Dass Roboterblick als Ausdruck einer (sprachlichen) Absicht interpretiert wird, zeigt sich auch in den Antworten, mit denen Versuchspersonen falsche Aussagen des Roboters korrigierten. In den Experimenten 4-5 wurde außerdem die Anordnung der Verweise durch Blick und Sprache manipuliert. Während die genaue zeitliche Abstimmung der Verweise den Einfluss von Roboterblick nicht mindert, so scheint die Reihenfolge der Verweise entscheidend zu sein. Unsere Ergebnisse deuten darauf hin, dass Menschen Absichten aus den Verweisen durch Blickrichtung ableiten und erwarten, dass diese Absichten in derselben Anordnung umgesetzt werden. Insgesamt lassen unsere Ergebnisse also darauf schließen, dass Menschen versuchen, ihre Aufmerksamkeit gemeinsam mit Robotern zu koordinieren, um das Sprachverstehen zu erleichtern

Gesture and Speech in Interaction - 4th edition (GESPIN 4)

Author: Ferré Gaëlle
Mark Tutton
Publication venue: HAL CCSD
Publication date: 08/09/2015
Field of study

International audienceThe fourth edition of Gesture and Speech in Interaction (GESPIN) was held in Nantes, France. With more than 40 papers, these proceedings show just what a flourishing field of enquiry gesture studies continues to be. The keynote speeches of the conference addressed three different aspects of multimodal interaction:gesture and grammar, gesture acquisition, and gesture and social interaction. In a talk entitled Qualitiesof event construal in speech and gesture: Aspect and tense, Alan Cienki presented an ongoing researchproject on narratives in French, German and Russian, a project that focuses especially on the verbal andgestural expression of grammatical tense and aspect in narratives in the three languages. Jean-MarcColletta's talk, entitled Gesture and Language Development: towards a unified theoretical framework,described the joint acquisition and development of speech and early conventional and representationalgestures. In Grammar, deixis, and multimodality between code-manifestation and code-integration or whyKendon's Continuum should be transformed into a gestural circle, Ellen Fricke proposed a revisitedgrammar of noun phrases that integrates gestures as part of the semiotic and typological codes of individuallanguages. From a pragmatic and cognitive perspective, Judith Holler explored the use ofgaze and hand gestures as means of organizing turns at talk as well as establishing common ground in apresentation entitled On the pragmatics of multi-modal face-to-face communication: Gesture, speech andgaze in the coordination of mental states and social interaction.Among the talks and posters presented at the conference, the vast majority of topics related, quitenaturally, to gesture and speech in interaction - understood both in terms of mapping of units in differentsemiotic modes and of the use of gesture and speech in social interaction. Several presentations explored the effects of impairments(such as diseases or the natural ageing process) on gesture and speech. The communicative relevance ofgesture and speech and audience-design in natural interactions, as well as in more controlled settings liketelevision debates and reports, was another topic addressed during the conference. Some participantsalso presented research on first and second language learning, while others discussed the relationshipbetween gesture and intonation. While most participants presented research on gesture and speech froman observer's perspective, be it in semiotics or pragmatics, some nevertheless focused on another importantaspect: the cognitive processes involved in language production and perception. Last but not least,participants also presented talks and posters on the computational analysis of gestures, whether involvingexternal devices (e.g. mocap, kinect) or concerning the use of specially-designed computer software forthe post-treatment of gestural data. Importantly, new links were made between semiotics and mocap data

Incrementality and flexibility in sentence production

Author: Van de Velde M.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2015
Field of study

MPG.PuRe

Accessibility of referent information influences sentence planning : An eye-tracking study

Author: Chen Yiya
Ganushchak Lesya Y.
Konopka Agnieszka E.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

Acknowledgments We thank Phoebe Ye and Gouming Martens for help with data collection for Experiment 1 and 2, respectively. This research was supported by the European Research Council for the ERC Starting Grant (206198) to YC.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Frontiers - Publisher Connector

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Turn-Taking in Human Communicative Interaction

Author
Publication venue: 'Frontiers Media SA'
Publication date
Field of study

The core use of language is in face-to-face conversation. This is characterized by rapid turn-taking. This turn-taking poses a number central puzzles for the psychology of language. Consider, for example, that in large corpora the gap between turns is on the order of 100 to 300 ms, but the latencies involved in language production require minimally between 600ms (for a single word) or 1500 ms (for as simple sentence). This implies that participants in conversation are predicting the ends of the incoming turn and preparing in advance. But how is this done? What aspects of this prediction are done when? What happens when the prediction is wrong? What stops participants coming in too early? If the system is running on prediction, why is there consistently a mode of 100 to 300 ms in response time? The timing puzzle raises further puzzles: it seems that comprehension must run parallel with the preparation for production, but it has been presumed that there are strict cognitive limitations on more than one central process running at a time. How is this bottleneck overcome? Far from being 'easy' as some psychologists have suggested, conversation may be one of the most demanding cognitive tasks in our everyday lives. Further questions naturally arise: how do children learn to master this demanding task, and what is the developmental trajectory in this domain? Research shows that aspects of turn-taking such as its timing are remarkably stable across languages and cultures, but the word order of languages varies enormously. How then does prediction of the incoming turn work when the verb (often the informational nugget in a clause) is at the end? Conversely, how can production work fast enough in languages that have the verb at the beginning, thereby requiring early planning of the whole clause? What happens when one changes modality, as in sign languages -- with the loss of channel constraints is turn-taking much freer? And what about face-to-face communication amongst hearing individuals -- do gestures, gaze, and other body behaviors facilitate turn-taking? One can also ask the phylogenetic question: how did such a system evolve? There seem to be parallels (analogies) in duetting bird species, and in a variety of monkey species, but there is little evidence of anything like this among the great apes. All this constitutes a neglected set of problems at the heart of the psychology of language and of the language sciences. This research topic welcomes contributions from right across the board, for example from psycholinguists, developmental psychologists, students of dialogue and conversation analysis, linguists interested in the use of language, phoneticians, corpus analysts and comparative ethologists or psychologists. We welcome contributions of all sorts, for example original research papers, opinion pieces, and reviews of work in subfields that may not be fully understood in other subfields

OAPEN Library

Augmenting Situated Spoken Language Interaction with Listener Gaze

Author: Mitev Nikolina
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

Collaborative task solving in a shared environment requires referential success. Human speakers follow the listener’s behavior in order to monitor language comprehension (Clark, 1996). Furthermore, a natural language generation (NLG) system can exploit listener gaze to realize an effective interaction strategy by responding to it with verbal feedback in virtual environments (Garoufi, Staudte, Koller, & Crocker, 2016). We augment situated spoken language interaction with listener gaze and investigate its role in human-human and human-machine interactions. Firstly, we evaluate its impact on prediction of reference resolution using a mulitimodal corpus collection from virtual environments. Secondly, we explore if and how a human speaker uses listener gaze in an indoor guidance task, while spontaneously referring to real-world objects in a real environment. Thirdly, we consider an object identification task for assembly under system instruction. We developed a multimodal interactive system and two NLG systems that integrate listener gaze in the generation mechanisms. The NLG system “Feedback” reacts to gaze with verbal feedback, either underspecified or contrastive. The NLG system “Installments” uses gaze to incrementally refer to an object in the form of installments. Our results showed that gaze features improved the accuracy of automatic prediction of reference resolution. Further, we found that human speakers are very good at producing referring expressions, and showing listener gaze did not improve performance, but elicited more negative feedback. In contrast, we showed that an NLG system that exploits listener gaze benefits the listener’s understanding. Specifically, combining a short, ambiguous instruction with con- trastive feedback resulted in faster interactions compared to underspecified feedback, and even outperformed following long, unambiguous instructions. Moreover, alternating the underspecified and contrastive responses in an interleaved manner led to better engagement with the system and an effcient information uptake, and resulted in equally good performance. Somewhat surprisingly, when gaze was incorporated more indirectly in the generation procedure and used to trigger installments, the non-interactive approach that outputs an instruction all at once was more effective. However, if the spatial expression was mentioned first, referring in gaze-driven installments was as efficient as following an exhaustive instruction. In sum, we provide a proof of concept that listener gaze can effectively be used in situated human-machine interaction. An assistance system using gaze cues is more attentive and adapts to listener behavior to ensure communicative success

Universaar

Acronym

The Neural Mechanisms Supporting Structure and Inter-Brain Connectivity In Natural Conversation

Author: AbdulSabur Nuria
Publication venue
Publication date: 01/01/2014
Field of study

Conversation is the height of human communication and social interaction, yet little is known about the neural mechanisms supporting it. To date, there have been no ecologically valid neuroimaging studies of conversation, and for good reason. Until recently, imaging techniques were hindered by artifact related to speech production. Now that we can circumvent this problem, I attempt to uncover the neural correlates of multiple aspects of conversation, including coordinating speaker change, the effect of conversation type (e.g. cooperative or argumentative) on inter-brain coupling, and the relationship between this coupling and social coherence. Pairs of individuals underwent simultaneous fMRI brain scans while they engaged in a series of unscripted conversations, for a total of 40 pairs (80 individuals). The first two studies in this dissertation lay a foundation by outlining brain regions supporting comprehension and production in both narrative and conversation - two aspects of discourse level communication. The subsequent studies focus on two unique features of conversation: alternating turns-at-talk and establishing inter-brain coherence through speech. The results show that at the moment of speaker change, both people are engaging attentional and mentalizing systems - which likely support orienting toward implicit cues signaling speaker change as well as anticipating the other person's intention to either begin or end his turn. Four networks were identified that are significantly predicted by a novel measure of social coherence; they include the posterior parietal cortex, medial prefrontal cortex, and right angular gyrus. Taken together, the findings reveal that natural conversation relies on multiple cognitive networks besides language to coordinate or enhance social interaction. &#8195

Digital Repository at the University of Maryland

Visual attention during conversation: an investigation using real-world stimuli.

Author: Dawson J
Publication venue
Publication date: 25/01/2022
Field of study

This research investigates how people visually attend to each other in realistic settings. In particular, I explore how people move their eyes to attend to speakers during social situations. I examine which signalling cues are crucial to social interactions and how they work in conjunction to enable successful conversation in humans. Furthermore, a main aim of this research is to explore eye movement when participants are live or are third-party observers. Overall, using a range of techniques, the research has demonstrated the benefit of using both audio and visual cues to guide conversation following; how viewing the eyes of the speakers and their spatial location facilitates this; as well as an investigation of social attention in those with traits of disorders. Moreover, a key finding of the thesis is demonstrating the similarities in live eye-movements and third-party observations. Overall, the thesis offers a comprehensive account of which factors attract visual attention to speakers and facilitate conversation following

University of Essex Research Repository

Telops for language learning: Japanese language learners’ perceptions of authentic Japanese variety shows and implications for their use in the classroom

Author: Sikkema Eline Christina
Publication venue: Dublin City University. Centre for Translation and Textual Studies (CTTS)
Publication date: 01/03/2020
Field of study

Research on the use of leisure-oriented media products in foreign language learning is not a novelty. Building further on insights into the effects of audiovisual input on learners, recent studies have started to explore online learning behaviour. This research employed an exploratory design to examine the perceptions of a Japanese variety show with intralingual text, known as telops, by Japanese Language Learners (JLLs) and native Japanese speakers through a multimodal transcript, eye-tracking technology, questionnaires, and field notes. Two main objectives underlie this study: (1) to gain insights into participants’ multimodal perceptions and attitudes towards the use of such authentic material for language learning, and (2) to gain a better understanding of the distribution of participants’ visual attention between stimuli. Data from 43 JLLs and five native Japanese speakers were analysed. The JLLs were organised into a pre-exchange, exchange and post-exchange group while the native Japanese speakers functioned as the reference group. A thematic analysis was conducted on the open-ended questionnaire responses and Areas Of Interest (AOIs) were grouped to generate fixation data. The themes suggest that all learner groups feel that telops help them link the stimuli in the television programme although some difficulty was experienced with the amount and pace of telops in the pre-exchange and exchange groups. The eye-tracking results show that faces and telops gather the most visual attention from all participant groups. Less clear-cut trends in visual attention are detected when AOIs on telops are grouped according to the degree in which they resemble the corresponding dialogue. This thesis concludes with suggestions as to how such authentic material can complement Japanese language learning

Irish Universities

DCU Online Research Access Service