2,282 research outputs found

    Early Turn-taking Prediction with Spiking Neural Networks for Human Robot Collaboration

    Full text link
    Turn-taking is essential to the structure of human teamwork. Humans are typically aware of team members' intention to keep or relinquish their turn before a turn switch, where the responsibility of working on a shared task is shifted. Future co-robots are also expected to provide such competence. To that end, this paper proposes the Cognitive Turn-taking Model (CTTM), which leverages cognitive models (i.e., Spiking Neural Network) to achieve early turn-taking prediction. The CTTM framework can process multimodal human communication cues (both implicit and explicit) and predict human turn-taking intentions in an early stage. The proposed framework is tested on a simulated surgical procedure, where a robotic scrub nurse predicts the surgeon's turn-taking intention. It was found that the proposed CTTM framework outperforms the state-of-the-art turn-taking prediction algorithms by a large margin. It also outperforms humans when presented with partial observations of communication cues (i.e., less than 40% of full actions). This early prediction capability enables robots to initiate turn-taking actions at an early stage, which facilitates collaboration and increases overall efficiency.Comment: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 201

    Computational models of social and emotional turn-taking for embodied conversational agents: a review

    Get PDF
    The emotional involvement of participants in a conversation not only shows in the words they speak and in the way they speak and gesture but also in their turn-taking behavior. This paper reviews research into computational models of embodied conversational agents. We focus on models for turn-taking management and (social) emotions. We are particularly interested in how in these models emotions of the agent itself and those of the others in uence the agent's turn-taking behavior and vice versa how turn-taking behavior of the partner is perceived by the agent itself. The system of turn-taking rules presented by Sacks, Schegloff and Jefferson (1974) is often a starting point for computational turn-taking models of conversational agents. But emotions have their own rules besides the "one-at-a-time" paradigm of the SSJ system. It turns out that almost without exception computational models of turn-taking behavior that allow "continuous interaction" and "natural turntaking" do not model the underlying psychological, affective, attentional and cognitive processes. They are restricted to rules in terms of a number of supercially observable cues. On the other hand computational models for virtual humans that are based on a functional theory of social emotion do not contain explicit rules on how social emotions affect turn-taking behavior or how the emotional state of the agent is affected by turn-taking behavior of its interlocutors. We conclude with some preliminary ideas on what an architecture for emotional turn-taking should look like and we discuss the challenges in building believable emotional turn-taking agents

    Turn-Taking in Human Communicative Interaction

    Get PDF
    The core use of language is in face-to-face conversation. This is characterized by rapid turn-taking. This turn-taking poses a number central puzzles for the psychology of language. Consider, for example, that in large corpora the gap between turns is on the order of 100 to 300 ms, but the latencies involved in language production require minimally between 600ms (for a single word) or 1500 ms (for as simple sentence). This implies that participants in conversation are predicting the ends of the incoming turn and preparing in advance. But how is this done? What aspects of this prediction are done when? What happens when the prediction is wrong? What stops participants coming in too early? If the system is running on prediction, why is there consistently a mode of 100 to 300 ms in response time? The timing puzzle raises further puzzles: it seems that comprehension must run parallel with the preparation for production, but it has been presumed that there are strict cognitive limitations on more than one central process running at a time. How is this bottleneck overcome? Far from being 'easy' as some psychologists have suggested, conversation may be one of the most demanding cognitive tasks in our everyday lives. Further questions naturally arise: how do children learn to master this demanding task, and what is the developmental trajectory in this domain? Research shows that aspects of turn-taking such as its timing are remarkably stable across languages and cultures, but the word order of languages varies enormously. How then does prediction of the incoming turn work when the verb (often the informational nugget in a clause) is at the end? Conversely, how can production work fast enough in languages that have the verb at the beginning, thereby requiring early planning of the whole clause? What happens when one changes modality, as in sign languages -- with the loss of channel constraints is turn-taking much freer? And what about face-to-face communication amongst hearing individuals -- do gestures, gaze, and other body behaviors facilitate turn-taking? One can also ask the phylogenetic question: how did such a system evolve? There seem to be parallels (analogies) in duetting bird species, and in a variety of monkey species, but there is little evidence of anything like this among the great apes. All this constitutes a neglected set of problems at the heart of the psychology of language and of the language sciences. This research topic welcomes contributions from right across the board, for example from psycholinguists, developmental psychologists, students of dialogue and conversation analysis, linguists interested in the use of language, phoneticians, corpus analysts and comparative ethologists or psychologists. We welcome contributions of all sorts, for example original research papers, opinion pieces, and reviews of work in subfields that may not be fully understood in other subfields

    Shifting embodied participation in multiparty university student meetings

    Get PDF
    PhD ThesisStudent group work has been used in higher education as an effective means to cultivate students’ work-related skills and cooperative learning. These encounters of small groups are the sites where, through talk and other resources, university students get their educational tasks done as well as acquire essential workplace skills such as problem-solving, team working, decision-making and leadership. However, settings of educational talk-as-work, such as student group meetings, remain under-researched (Stokoe, Benwell, & Attenborough, 2013). The present study therefore attempts to bridge this gap by investigating the professional and academic abilities of university students to participate in multiparty group meetings, drawing upon a dataset of video- and audio-recorded meetings from the Newcastle University Corpus of Academic English (NUCASE). The dataset consists of ten hours of meetings in which a group of naval architecture undergraduate students work cooperatively on their final year project – to design and build a wind turbine. The study applies the methodological approach of conversation analysis (CA) with a multimodal perspective. It presents a fine-detailed, sequential multimodal analysis of a collection of cases of speaker transitions, and reveals how meeting participants display speakership and recipiency with their verbal/vocal and bodily-visual coordination. In this respect, the present study is the first to offer a systematic collection, as well as a thorough investigation, of speaker transition and turn-taking practices from a multimodal perspective, especially with the scope of analysis beyond pre-turn and turn-beginning positions. It shows how speaker transitions through ‘current speaker selects next’ and ‘next speaker self-selects’ are joint-undertakings not only between the self-selecting/current speaker, and the target recipient/addressed next speaker, but also among other co-present participants. Especially, by mobilising the whole set of multimodal resources, participants are able to display their multiple orientations toward their co-participants, project, pursue and accomplish multiple courses of action in concurrence, and intricately coordinate their mutual orientation toward the shifting and emerging participation framework during the transition, establishment and maintenance of the speakership and recipiency. By presenting the data and analysis, this study extends ii boundaries of existing understandings on the temporality, sequentiality and systematicity of multimodal resources in talk-and-bodies-in-interaction. The thesis also contributes to interaction research in the particular context of student group work in higher education contexts, by providing a ‘screenshot’ of students’ academic lives as it unfolds ‘in flight’. Particularly, it reveals how students competently participate in multiparty group meetings (e.g., taking and allocating turns), co-construct the unfolding meeting procedures (e.g., roundtable update discussion), and jointly achieve the local interactional goals (e.g., sharing work progress, reaching an agreement). Acquiring such skills is, as it argues above, not only crucial for accomplishing the educational tasks, but also necessary for preparing university students to fulfill their future workplace expectations. The study therefore further informs the practices of university students and professional practitioners in multiparty meetings, and also draws on methodological implications for multimodal CA research

    The Power of a Glance: Evaluating Embodiment and Turn-Tracking Strategies of an Active Robotic Overhearer

    Get PDF
    Kousidis S, Schlangen D. The Power of a Glance: Evaluating Embodiment and Turn-Tracking Strategies of an Active Robotic Overhearer. In: Proceedings of AAAI Spring Symposium on Turn-taking and Coordination in Human-Machine Interaction. Palo Alto, CA, U.S.A.: Association for the Advancement of Artificial Intelligence; 2015: 36-43

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

    Full Issue

    Get PDF

    Socially aware conversational agents

    Get PDF
    • …
    corecore