Search CORE

30 research outputs found

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

Author: Krishna G. Sai
Kundrapu Supriya
Nemani Praneeth
Publication venue
Publication date: 14/06/2023
Field of study

Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR involving different algorithms and datasets to evaluate system performance. These efforts have resulted in significant progress in developing effective VSR models, creating new opportunities for further research in this area. This survey provides a detailed examination of the progression of VSR over the past three decades, with a particular emphasis on the transition from speaker-dependent to speaker-independent systems. We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence. The survey covers the works published from 1990 to 2023, thoroughly analyzing each work and comparing them on various parameters. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023. It outlines the development of VSR systems over time and highlights the need to develop end-to-end pipelines for speaker-independent VSR. The pictorial representation offers a clear and concise overview of the techniques used in speaker-independent VSR, thereby aiding in the comprehension and analysis of the various methodologies. The survey also highlights the strengths and limitations of each technique and provides insights into developing novel approaches for analyzing visual speech cues. Overall, This comprehensive review provides insights into the current state-of-the-art speaker-independent VSR and highlights potential areas for future research

arXiv.org e-Print Archive

Sign Language Recognition

Author: A. Corradini
A. Farhadi
A. Micilotta
A. Rezaei
A. Roussos
B. Bauer
B. Bauer
B. Stenger
B. Stenger
British Deaf Association
C. Valli
C. Vogler
C. Vogler
C. Vogler
C. Vogler
C. Wang
C.-L. Huang
C.-S. Lee
D. Stein
E. Efthimiou
E. Murphy-Chutorian
E.-J. Ong
E.-J. Ong
E.J. Holden
E.J. Holden
F. Gaolin
H. Cooper
H. Cooper
H. Cooper
H. Ershaed
H. Fillbrandt
H. Hienz
H.-D. Yang
I. Oikonomidis
J. Bungeroth
J. Han
J. Isaacs
J. Segen
J. Zieren
J.-S. Kim
J.B. Kim
J.L. Hernandez-Rebollar
J.W. Han
K. Bailly
K. Grobel
K. Lyons
K. Murakami
K.W. Ming
L.G. Zhang
M. Krinidis
M. Ouhyoung
M. Pahlevanzadeh
M. Zahedi
M. Zahedi
M.-H. Yang
M.B. Waldron
M.W. Kadous
N. Pugeault
O. Aran
P. Doliotis
P. Ekman
P. Goh
P. Heracleous
P. Yin
R. Bowden
R. Elliott
R. Feris
R. Grzeszcuk
R. Munoz-Salinas
R. Sutton-Spence
S. Akyol
S. Hadfield
S. Hong
S. Koelstra
S. Liwicki
S. Mitra
S.-F. Wong
S.C.W. Ong
S.K. Liddell
S.O. Ba
T. Sheerman-Chase
T. Starner
T. Starner
T. Yamaguchi
T.D. Nguyen
T.E. Jerde
U. Agris von
U. Agris von
V. Athitsos
W. Gao
W.C. Stokoe
Y. Lan
Y. Yacoob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

Crossref

Surrey Research Insight

Using a Discrete Hidden Markov Model Kernel for lip-based biometric identification

Author: Aleksic
Alizadeh
Braga-Neto
Bui
Carlos M. Travieso
Cervantes
Chan
Chetty
Chin
Coianiz
Dargham
David
De la Cuesta
Dong
Faraj
Ferrer
Fox
Hernando
Jaakkola
Jain
Jain
Jesús B. Alonso
Jianguo Zhang
Joachims
Kumar
Lal-Raheja
Langner
Lewis
Mehra
Newman
Newman
Otsu
Paul Miller
Rabiner
Rabiner
Rao
Rohani
Salazar
Steifelhagen
Travieso
Tselios
Viola
Wang
Wark
Wark
Yaling
Yan
Yujie
Zhe
Çetingül
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Sign Language Recognition Using Sub-units

Author: JW Han
L Breiman
MB Waldron
SK Liddell
T Starner
WC Stokoe
Y Amit
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This chapter discusses sign language recognition using linguistic sub-units. It presents three types of sub-units for consideration; those learnt from appearance data as well as those inferred from both 2D or 3D tracking data. These sub-units are then combined using a sign level classifier; here, two options are presented. The first uses Markov Models to encode the temporal changes between sub-units. The second makes use of Sequential Pattern Boosting to apply discriminative feature selection at the same time as encoding temporal information. This approach is more robust to noise and performs well in signer independent tests, improving results from the 54% achieved by the Markov Chains to 76%

Crossref

Surrey Research Insight

Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis

Author
Publication venue
Publication date: 01/01/2016
Field of study

abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

Author: Gao Lufei
Lei Wentao
Lin Xiaotian
Liu Li
Ma Fengji
Wang Jinting
Publication venue
Publication date: 17/08/2023
Field of study

Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language

arXiv.org e-Print Archive

Face Mask Extraction in Video Sequence

Author: Luo B
Pantic M
Shen J
Wang Y
Publication venue
Publication date: 01/01/2018
Field of study

Inspired by the recent development of deep network-based methods in semantic image segmentation, we introduce an end-to-end trainable model for face mask extraction in video sequence. Comparing to landmark-based sparse face shape representation, our method can produce the segmentation masks of individual facial components, which can better reflect their detailed shape variations. By integrating Convolutional LSTM (ConvLSTM) algorithm with Fully Convolutional Networks (FCN), our new ConvLSTM-FCN model works on a per-sequence basis and takes advantage of the temporal correlation in video clips. In addition, we also propose a novel loss function, called Segmentation Loss, to directly optimise the Intersection over Union (IoU) performances. In practice, to further increase segmentation accuracy, one primary model and two additional models were trained to focus on the face, eyes, and mouth regions, respectively. Our experiment shows the proposed method has achieved a 16.99% relative improvement (from 54.50% to 63.76% mean IoU) over the baseline FCN model on the 300 Videos in the Wild (300VW) dataset

Spiral - Imperial College Digital Repository