Search CORE

76 research outputs found

Multi-channel Transformers for Multi-articulatory Sign Language Translation

Author: BS Parton
C Valli
E Malaia
G Caridakis
J Charles
J Zelinka
M Johnson
M Luzardo
O Koller
P Bojanowski
R Cui
R Sutton-Spence
S Tamura
SK Ko
T Starner
U Bellugi
Publication venue
Publication date: 01/01/2020
Field of study

Sign languages use multiple asynchronous information channels (articulators), not just the hands but also the face and body, which computational approaches often ignore. In this paper we tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself, while also maintaining channel specific information. We evaluate our approach on the RWTH-PHOENIX-Weather-2014T dataset and report competitive translation performance. Importantly, we overcome the reliance on gloss annotations which underpin other state-of-the-art approaches, thereby removing future need for expensive curated datasets

arXiv.org e-Print Archive

Crossref

University of Surrey

Surrey Research Insight

Adaptive Gesture Recognition with Variation Estimation for Interactive Systems

Author: Bevilacqua F.
Black M.
Black M. J.
Bretzner L.
Caramiaux B.
Caridakis G.
Douc R.
Gavrila D. M.
Höner O.
Merrill D.
Rasamimanana N.
Rocchesso D.
Verplank B.
Visell Y.
Wilson A. D.
Yacoob Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2014
Field of study

This paper presents a gesture recognition/adaptation system for Human Computer Interaction applications that goes beyond activity classification and that, complementary to gesture labeling, characterizes the movement execution. We describe a template-based recognition method that simultaneously aligns the input gesture to the templates using a Sequential Montecarlo inference technique. Contrary to standard template- based methods based on dynamic programming, such as Dynamic Time Warping, the algorithm has an adaptation process that tracks gesture variation in real-time. The method continuously updates, during execution of the gesture, the estimated parameters and recognition results which offers key advantages for continuous human-machine interaction. The technique is evaluated in several different ways: recognition and early recognition are evaluated on a 2D onscreen pen gestures; adaptation is assessed on synthetic data; and both early recognition and adaptation is evaluation in a user study involving 3D free space gestures. The method is not only robust to noise and successfully adapts to parameter variation but also performs recognition as well or better than non-adapting offline template-based methods

Goldsmiths Research Online

Crossref

Emotion recognition through multiple modalities: Face, body gesture, speech

Author: Caridakis G
Castellano G
Kessous L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

DSpace at NTUA

Multimodal Emotion Recognition in Speech-based Interaction Using Facial Expression, Body Gesture and Acoustic Analysis

Author: G. Caridakis
G. Castellano
L. Kessous
Publication venue
Publication date: 01/01/2010
Field of study

In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more tha

CiteSeerX

University of Birmingham Research Portal

DSpace at NTUA

User and context adaptive neural networks for emotion recognition

Author: Caridakis G
Karpouzis K
Kollias S
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

DSpace at NTUA

Non-manual cues in automatic sign language recognition

Author: Asteriadis S
Caridakis G
Karpouzis K
Publication venue
Publication date: 01/01/2011
Field of study

DSpace at NTUA

User modeling via gesture and head pose expressivity features

Author: Asteriadis S
Caridakis G
Karpouzis K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Current work focuses on user modeling in terms of affective analysis that could in turn be used in intelligent personalized interfaces and systems, dynamic profiling and context-aware multimedia applications. The analysis performed within this work comprises of statistical processing and classification of automatically extracted gestural and head pose expressivity features. Computational formulation of qualitative expressive cues of body and head motion is performed and the resulting features are processed statistically, their correlation is studied and finally an emotion recognition attempt is presented based on these features. Significant emotion specific patterns and expressivity features interrelations are derived while the emotion recognition results indicate that the gestural and head pose expressivity features could supplement and enhance a multimodal affective analysis system incorporating an additional modality to be fused with other commonly used modalities such as facial expressions, prosodic and lexical acoustic features and physiological measurements.

Maastricht University Research Portal

CiteSeerX

Crossref

DSpace at NTUA