3,342 research outputs found

    Introduction: Multimodal interaction

    Get PDF
    That human social interaction involves the intertwined cooperation of different modalities is uncontroversial. Researchers in several allied fields have, however, only recently begun to document the precise ways in which talk, gesture, gaze, and aspects of the material surround are brought together to form coherent courses of action. The papers in this volume are attempts to develop this line of inquiry. Although the authors draw on a range of analytic, theoretical, and methodological traditions (conversation analysis, ethnography, distributed cognition, and workplace studies), all are concerned to explore and illuminate the inherently multimodal character of social interaction. Recent studies, including those collected in this volume, suggest that different modalities work together not only to elaborate the semantic content of talk but also to constitute coherent courses of action. In this introduction we present evidence for this position. We begin by reviewing some select literature focusing primarily on communicative functions and interactive organizations of specific modalities before turning to consider the integration of distinct modalities in interaction

    Visible movements of the orofacial area: evidence for gestural or multimodal theories of language evolution?

    Get PDF
    The age-old debate between the proponents of the gesture-first and speech-first positions has returned to occupy a central place in current language evolution theorizing. The gestural scenarios, suffering from the problem known as “modality transition” (why a gestural system would have changed into a predominantly spoken system), frequently appeal to the gestures of the orofacial area as a platform for this putative transition. Here, we review currently available evidence on the significance of the orofacial area in language evolution. While our review offers some support for orofacial movements as an evolutionary “bridge” between manual gesture and speech, we see the evidence as far more consistent with a multimodal approach. We also suggest that, more generally, the “gestural versus spoken” formulation is limiting and would be better expressed in terms of the relative input and interplay of the visual and vocal-auditory sensory modalities

    Audiovisual integration of emotional signals from others' social interactions

    Get PDF
    Audiovisual perception of emotions has been typically examined using displays of a solitary character (e.g., the face-voice and/or body-sound of one actor). However, in real life humans often face more complex multisensory social situations, involving more than one person. Here we ask if the audiovisual facilitation in emotion recognition previously found in simpler social situations extends to more complex and ecological situations. Stimuli consisting of the biological motion and voice of two interacting agents were used in two experiments. In Experiment 1, participants were presented with visual, auditory, auditory filtered/noisy, and audiovisual congruent and incongruent clips. We asked participants to judge whether the two agents were interacting happily or angrily. In Experiment 2, another group of participants repeated the same task, as in Experiment 1, while trying to ignore either the visual or the auditory information. The findings from both experiments indicate that when the reliability of the auditory cue was decreased participants weighted more the visual cue in their emotional judgments. This in turn translated in increased emotion recognition accuracy for the multisensory condition. Our findings thus point to a common mechanism of multisensory integration of emotional signals irrespective of social stimulus complexity

    Virtual Meeting Rooms: From Observation to Simulation

    Get PDF
    Virtual meeting rooms are used for simulation of real meeting behavior and can show how people behave, how they gesture, move their heads, bodies, their gaze behavior during conversations. They are used for visualising models of meeting behavior, and they can be used for the evaluation of these models. They are also used to show the effects of controlling certain parameters on the behavior and in experiments to see what the effect is on communication when various channels of information - speech, gaze, gesture, posture - are switched off or manipulated in other ways. The paper presents the various stages in the development of a virtual meeting room as well and illustrates its uses by presenting some results of experiments to see whether human judges can induce conversational roles in a virtual meeting situation when they only see the head movements of participants in the meeting

    Speech-driven Animation with Meaningful Behaviors

    Full text link
    Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

    A review of theories and methods in the science of face-to-face social interaction

    Get PDF
    For most of human history, face-to-face interactions have been the primary and most fundamental way to build social relationships, and even in the digital era they remain the basis of our closest bonds. These interactions are built on the dynamic integration and coordination of verbal and non-verbal information between multiple people. However, the psychological processes underlying face-to-face interaction remain difficult to study. In this Review, we discuss three ways the multimodal phenomena underlying face-to-face social interaction can be organized to provide a solid basis for theory development. Next, we review three types of theory of social interaction: theories that focus on the social meaning of actions, theories that explain actions in terms of simple behaviour rules and theories that rely on rich cognitive models of the internal states of others. Finally, we address how different methods can be used to distinguish between theories, showcasing new approaches and outlining important directions for future research. Advances in how face-to-face social interaction can be studied, combined with a renewed focus on cognitive theories, could lead to a renaissance in social interaction research and advance scientific understanding of face-to-face interaction and its underlying cognitive foundations
    corecore