299 research outputs found

    Multimodal language processing in human communication

    No full text
    Multiple layers of visual (and vocal) signals, plus their different onsets and offsets, represent a significant semantic and temporal binding problem during face-to-face conversation. Despite this complex unification process, multimodal messages appear to be processed faster than unimodal messages. Multimodal gestalt recognition and multilevel prediction are proposed to play a crucial role in facilitating multimodal language processing. The basis of the processing mechanisms involved in multimodal language comprehension is hypothesized to be domain general, coopted for communication, and refined with domain-specific characteristics. A new, situated framework for understanding human language processing is called for that takes into consideration the multilayered, multimodal nature of language and its production and comprehension in conversational interaction requiring fast processing

    Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review

    Get PDF
    Death by suicide is the seventh leading death cause worldwide. The recent advancement in Artificial Intelligence (AI), specifically AI applications in image and voice processing, has created a promising opportunity to revolutionize suicide risk assessment. Subsequently, we have witnessed fast-growing literature of research that applies AI to extract audiovisual non-verbal cues for mental illness assessment. However, the majority of the recent works focus on depression, despite the evident difference between depression symptoms and suicidal behavior and non-verbal cues. This paper reviews recent works that study suicide ideation and suicide behavior detection through audiovisual feature analysis, mainly suicidal voice/speech acoustic features analysis and suicidal visual cues. Automatic suicide assessment is a promising research direction that is still in the early stages. Accordingly, there is a lack of large datasets that can be used to train machine learning and deep learning models proven to be effective in other, similar tasks.Comment: Manuscript submitted to Arificial Intelligence Reviews (2022

    Motion Generation during Vocalized Emotional Expressions and Evaluation in Android Robots

    Get PDF
    Vocalized emotional expressions such as laughter and surprise often occur in natural dialogue interactions and are important factors to be considered in order to achieve smooth robot-mediated communication. Miscommunication may be caused if there is a mismatch between audio and visual modalities, especially in android robots, which have a highly humanlike appearance. In this chapter, motion generation methods are introduced for laughter and vocalized surprise events, based on analysis results of human behaviors during dialogue interactions. The effectiveness of controlling different modalities of the face, head, and upper body (eyebrow raising, eyelid widening/narrowing, lip corner/cheek raising, eye blinking, head motion, and torso motion control) and different motion control levels are evaluated using an android robot. Subjective experiments indicate the importance of each modality in the perception of motion naturalness (humanlikeness) and the degree of emotional expression

    Motion-capture patterns of dynamic facial expressions in children and adolescents with and without ASD

    Get PDF
    Research shows that neurotypical individuals struggle to interpret the emotional facial expressions of people with Autism Spectrum Disorder (ASD). The current study uses motion-capture to objectively quantify differences between the movement patterns of emotional facial expressions of individuals with and without ASD. Participants volitionally mimicked emotional expressions while wearing facial markers. Recorded marker movement was grouped by expression valence and intensity. We used Growth Curve Analysis to test whether movement patterns were predictable by expression type and participant group. Results show significant interactions between expression type and group, and little effect of emotion valence on ASD expressions. Together, results support perceptions that expressions of individuals with ASD are different from -- and more ambiguous than -- those of neurotypical individuals’

    Social behavior modeling based on Incremental Discrete Hidden Markov Models

    No full text
    12 pagesInternational audienceModeling multimodal face-to-face interaction is a crucial step in the process of building social robots or users-aware Embodied Conversational Agents (ECA). In this context, we present a novel approach for human behavior analysis and generation based on what we called "Incremental Discrete Hidden Markov Model" (IDHMM). Joint multimodal activities of interlocutors are first modeled by a set of DHMMs that are specific to supposed joint cognitive states of the interlocutors. Respecting a task-specific syntax, the IDHMM is then built from these DHMMs and split into i) a recognition model that will determine the most likely sequence of cognitive states given the multimodal activity of the in- terlocutor, and ii) a generative model that will compute the most likely activity of the speaker given this estimated sequence of cognitive states. Short-Term Viterbi (STV) decoding is used to incrementally recognize and generate behav- ior. The proposed model is applied to parallel speech and gaze data of interact- ing dyads

    Articulation in time : Some word-initial segments in Swedish

    Get PDF
    Speech is both dynamic and distinctive at the same time. This implies a certain contradiction which has entertained researchers in phonetics and phonology for decades. The present dissertation assumes that articulation behaves as a function of time, and that we can find phonological structures in the dynamical systems. EMA is used to measure mechanical movements in Swedish speakers. The results show that tonal context affects articulatory coordination. Acceleration seems to divide the movements of the jaw and lips into intervals of postures and active movements. These intervals are affected differently by the tonal context. Furthermore, a bilabial consonant is shorter if the next consonant is also made with the lips. A hypothesis of a correlation between acoustic segment duration and acceleration is presented. The dissertation highlights the importance of time for how speech ultimately sounds. Particularly significant is the combination of articulatory timing and articulatory duration
    • …
    corecore