76 research outputs found

    Multi-channel Transformers for Multi-articulatory Sign Language Translation

    Full text link
    Sign languages use multiple asynchronous information channels (articulators), not just the hands but also the face and body, which computational approaches often ignore. In this paper we tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself, while also maintaining channel specific information. We evaluate our approach on the RWTH-PHOENIX-Weather-2014T dataset and report competitive translation performance. Importantly, we overcome the reliance on gloss annotations which underpin other state-of-the-art approaches, thereby removing future need for expensive curated datasets

    Adaptive Gesture Recognition with Variation Estimation for Interactive Systems

    Get PDF
    This paper presents a gesture recognition/adaptation system for Human Computer Interaction applications that goes beyond activity classification and that, complementary to gesture labeling, characterizes the movement execution. We describe a template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Montecarlo inference technique. Contrary to standard template- based methods based on dynamic programming, such as Dynamic Time Warping, the algorithm has an adaptation process that tracks gesture variation in real-time. The method continuously updates, during execution of the gesture, the estimated parameters and recognition results which offers key advantages for continuous human-machine interaction. The technique is evaluated in several different ways: recognition and early recognition are evaluated on a 2D onscreen pen gestures; adaptation is assessed on synthetic data; and both early recognition and adaptation is evaluation in a user study involving 3D free space gestures. The method is not only robust to noise and successfully adapts to parameter variation but also performs recognition as well or better than non-adapting offline template-based methods

    Emotion recognition through multiple modalities: Face, body gesture, speech

    No full text

    Multimodal Emotion Recognition in Speech-based Interaction Using Facial Expression, Body Gesture and Acoustic Analysis

    No full text
    In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more tha

    User modeling via gesture and head pose expressivity features

    No full text
    Current work focuses on user modeling in terms of affective analysis that could in turn be used in intelligent personalized interfaces and systems, dynamic profiling and context-aware multimedia applications. The analysis performed within this work comprises of statistical processing and classification of automatically extracted gestural and head pose expressivity features. Computational formulation of qualitative expressive cues of body and head motion is performed and the resulting features are processed statistically, their correlation is studied and finally an emotion recognition attempt is presented based on these features. Significant emotion specific patterns and expressivity features interrelations are derived while the emotion recognition results indicate that the gestural and head pose expressivity features could supplement and enhance a multimodal affective analysis system incorporating an additional modality to be fused with other commonly used modalities such as facial expressions, prosodic and lexical acoustic features and physiological measurements.
    • …
    corecore