10 research outputs found

    A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

    Full text link
    Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language

    Multimodal Computational Attention for Scene Understanding

    Get PDF
    Robotic systems have limited computational capacities. Hence, computational attention models are important to focus on specific stimuli and allow for complex cognitive processing. For this purpose, we developed auditory and visual attention models that enable robotic platforms to efficiently explore and analyze natural scenes. To allow for attention guidance in human-robot interaction, we use machine learning to integrate the influence of verbal and non-verbal social signals into our models

    Retargeting cued speech hand gestures for different talking heads and speakers

    No full text
    8International audienceCued Speech is a communication system that complements lip-reading with a small set of possible handshapes placed in different positions near the face. Developing a Cued Speech capable system is a time-consuming and difficult challenge. This paper focuses on how an existing bank of reference Cued Speech gestures, exhibiting natural dynamics for hand articulation and movements, could be reused for another speaker (augmenting some video or 3D talking heads). Any Cued Speech hand gesture should be recorded or considered with the concomitant facial locations that Cued Speech specifies to leverage the lip reading ambiguities (such as lip corner, chin, cheek and throat for French). These facial target points are moving along with head movements and because of speech articulation. The post-treatment algorithm proposed here will retarget synthesized hand gestures to another face, by slightly modifying the sequence of translations and rotations of the 3D hand. This algorithm preserves the co-articulation of the reference signal (including undershooting of the trajectories, as observed in fast Cued Speech) while adapting the gestures to the geometry, articulation and movements of the target face. We will illustrate how our Cued Speech capable audiovisual synthesizer - built using simultaneously recorded hand trajectories and facial articulation of a single French Cued Speech user - can be used as a reference signal for this retargeting algorithm. For the ongoing evaluation of our algorithm, an intelligibility paradigm has been retained, using natural videos for the face. The intelligibility of some video VCV sequences with composited hand gestures for Cued Speech is being measured using a panel of Cued Speech users

    Retargeting cued speech hand gestures for different talking heads and speakers

    Get PDF
    8International audienceCued Speech is a communication system that complements lip-reading with a small set of possible handshapes placed in different positions near the face. Developing a Cued Speech capable system is a time-consuming and difficult challenge. This paper focuses on how an existing bank of reference Cued Speech gestures, exhibiting natural dynamics for hand articulation and movements, could be reused for another speaker (augmenting some video or 3D talking heads). Any Cued Speech hand gesture should be recorded or considered with the concomitant facial locations that Cued Speech specifies to leverage the lip reading ambiguities (such as lip corner, chin, cheek and throat for French). These facial target points are moving along with head movements and because of speech articulation. The post-treatment algorithm proposed here will retarget synthesized hand gestures to another face, by slightly modifying the sequence of translations and rotations of the 3D hand. This algorithm preserves the co-articulation of the reference signal (including undershooting of the trajectories, as observed in fast Cued Speech) while adapting the gestures to the geometry, articulation and movements of the target face. We will illustrate how our Cued Speech capable audiovisual synthesizer - built using simultaneously recorded hand trajectories and facial articulation of a single French Cued Speech user - can be used as a reference signal for this retargeting algorithm. For the ongoing evaluation of our algorithm, an intelligibility paradigm has been retained, using natural videos for the face. The intelligibility of some video VCV sequences with composited hand gestures for Cued Speech is being measured using a panel of Cued Speech users

    Retargeting cued speech hand gestures for different talking heads and speakers

    No full text
    Cued Speech is a communication system that complements lip-reading with a small set of possible handshapes placed in different positions near the face. Developing a Cued Speech capable system is a time-consuming and difficult challenge. This paper focuses on how an existing bank of reference Cued Speech gestures, exhibiting natural dynamics for hand articulation and movements, could be reused for another speaker (augmenting some video or 3D talking heads). Any Cued Speech hand gesture should be recorded or considered with the concomitant facial locations that Cued Speech specifies to leverage the lip reading ambiguities (such as lip corner, chin, cheek and throat for French). These facial target points are moving along with head movements and because of speech articulation. The post-treatment algorithm proposed here will retarget synthesized hand gestures to another face, by slightly modifying the sequence of translations and rotations of the 3D hand. This algorithm preserves the coarticulation of the reference signal (including undershooting of the trajectories, as observed in fast Cued Speech) while adapting the gestures to the geometry, articulation and movements of the target face. We will illustrate how our Cued Speech capable audiovisual synthesizer – built using simultaneously recorded hand trajectories and facial articulation of a single French Cued Speech user – can be used as a reference signal for this retargeting algorithm. For the ongoing evaluation of our algorithm, an intelligibility paradigm has been retained, using natural videos for the face. The intelligibility of some video VCV sequences with composited hand gestures for Cued Speech is being measured using a panel of Cued Speech users. Index Terms: Cued Speech, hand motion retargeting, augmented speec

    Actor & Avatar: A Scientific and Artistic Catalog

    Get PDF
    What kind of relationship do we have with artificial beings (avatars, puppets, robots, etc.)? What does it mean to mirror ourselves in them, to perform them or to play trial identity games with them? Actor & Avatar addresses these questions from artistic and scholarly angles. Contributions on the making of "technical others" and philosophical reflections on artificial alterity are flanked by neuroscientific studies on different ways of perceiving living persons and artificial counterparts. The contributors have achieved a successful artistic-scientific collaboration with extensive visual material

    Proceedings of the 5th international conference on disability, virtual reality and associated technologies (ICDVRAT 2004)

    Get PDF
    The proceedings of the conferenc
    corecore