Search CORE

698 research outputs found

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

Author: Gao Lufei
Lei Wentao
Lin Xiaotian
Liu Li
Ma Fengji
Wang Jinting
Publication venue
Publication date: 17/08/2023
Field of study

Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language

arXiv.org e-Print Archive

Recommended from our members

Investigating the role of phonological awareness on phonological recoding during reading in deaf children

Author: Cooley Frances Grosvenor
Publication venue
Publication date: 09/08/2018
Field of study

This study uses eye-tracking to investigate the role of phonological awareness on phonological recoding during reading in deaf and hard-of-hearing (DHH) children who predominantly use sign language as compared to typically hearing children. Phonological recoding is one of the earliest strategies employed in reading, in which the reader maps each grapheme directly to the corresponding speech sound of the language (Jared, Levy, Ashby, and Agauas, 2015). Many DHH children struggle with reading, and the severity of the delays in some children increase with age. Although there are a few studies examining the eye-patterns during reading in DHH adults, there are considerably fewer studies examining phonological recoding and the role of phonological awareness during reading in DHH children (Belanger, Baum, and Mayberry 2011; Belanger, Rayner, and Mayberry, 2013). This study will be testing influence of the visual language signal on reading in deaf children. I compare phonological awareness skills of English, ASL, and mouthing gestures to reading fluency, measured via eye-movement patterns when reading a sequence of sentences an eye-tracker. Sentences are manipulated to target phonological recoding during reading by altering target words embedded in the sentence in three experimental conditions: no change, homophone foil, and spelling control (Jared et al. 2015). Preliminary results indicate that deaf signers are proficient readers and seemingly rely on ASL skills to read. In addition, I suggest that deaf signers do not participate in phonological recoding.Linguistic

Texas ScholarWorks

Examining the relationship between paired associate learning and reading ability in adults and children

Author: Lira Calabrich Simone
Publication venue
Publication date: 13/11/2023
Field of study

The role of phonology in visual word recognition: evidence from Chinese

Author: Ip JKM
Lau DKY
Leung MT
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2010
Field of study

Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin

HKU Scholars Hub

Interactive effects of orthography and semantics in Chinese picture naming

Author: Law SP
Su IF
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2010
Field of study

Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin

HKU Scholars Hub

Sign Language Recognition

Author: A. Corradini
A. Farhadi
A. Micilotta
A. Rezaei
A. Roussos
B. Bauer
B. Bauer
B. Stenger
B. Stenger
British Deaf Association
C. Valli
C. Vogler
C. Vogler
C. Vogler
C. Vogler
C. Wang
C.-L. Huang
C.-S. Lee
D. Stein
E. Efthimiou
E. Murphy-Chutorian
E.-J. Ong
E.-J. Ong
E.J. Holden
E.J. Holden
F. Gaolin
H. Cooper
H. Cooper
H. Cooper
H. Ershaed
H. Fillbrandt
H. Hienz
H.-D. Yang
I. Oikonomidis
J. Bungeroth
J. Han
J. Isaacs
J. Segen
J. Zieren
J.-S. Kim
J.B. Kim
J.L. Hernandez-Rebollar
J.W. Han
K. Bailly
K. Grobel
K. Lyons
K. Murakami
K.W. Ming
L.G. Zhang
M. Krinidis
M. Ouhyoung
M. Pahlevanzadeh
M. Zahedi
M. Zahedi
M.-H. Yang
M.B. Waldron
M.W. Kadous
N. Pugeault
O. Aran
P. Doliotis
P. Ekman
P. Goh
P. Heracleous
P. Yin
R. Bowden
R. Elliott
R. Feris
R. Grzeszcuk
R. Munoz-Salinas
R. Sutton-Spence
S. Akyol
S. Hadfield
S. Hong
S. Koelstra
S. Liwicki
S. Mitra
S.-F. Wong
S.C.W. Ong
S.K. Liddell
S.O. Ba
T. Sheerman-Chase
T. Starner
T. Starner
T. Yamaguchi
T.D. Nguyen
T.E. Jerde
U. Agris von
U. Agris von
V. Athitsos
W. Gao
W.C. Stokoe
Y. Lan
Y. Yacoob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set

Surrey Research Insight

Sensorimotor Modulations by Cognitive Processes During Accurate Speech Discrimination: An EEG Investigation of Dorsal Stream Processing

Author: Jenson David E.
Publication venue: UTHSC Digital Commons
Publication date: 01/05/2018
Field of study

Internal models mediate the transmission of information between anterior and posterior regions of the dorsal stream in support of speech perception, though it remains unclear how this mechanism responds to cognitive processes in service of task demands. The purpose of the current study was to identify the influences of attention and working memory on sensorimotor activity across the dorsal stream during speech discrimination, with set size and signal clarity employed to modulate stimulus predictability and the time course of increased task demands, respectively. Independent Component Analysis of 64–channel EEG data identified bilateral sensorimotor mu and auditory alpha components from a cohort of 42 participants, indexing activity from anterior (mu) and posterior (auditory) aspects of the dorsal stream. Time frequency (ERSP) analysis evaluated task-related changes in focal activation patterns with phase coherence measures employed to track patterns of information flow across the dorsal stream. ERSP decomposition of mu clusters revealed event-related desynchronization (ERD) in beta and alpha bands, which were interpreted as evidence of forward (beta) and inverse (alpha) internal modeling across the time course of perception events. Stronger pre-stimulus mu alpha ERD in small set discrimination tasks was interpreted as more efficient attentional allocation due to the reduced sensory search space enabled by predictable stimuli. Mu-alpha and mu-beta ERD in peri- and post-stimulus periods were interpreted within the framework of Analysis by Synthesis as evidence of working memory activity for stimulus processing and maintenance, with weaker activity in degraded conditions suggesting that covert rehearsal mechanisms are sensitive to the quality of the stimulus being retained in working memory. Similar ERSP patterns across conditions despite the differences in stimulus predictability and clarity, suggest that subjects may have adapted to tasks. In light of this, future studies of sensorimotor processing should consider the ecological validity of the tasks employed, as well as the larger cognitive environment in which tasks are performed. The absence of interpretable patterns of mu-auditory coherence modulation across the time course of speech discrimination highlights the need for more sensitive analyses to probe dorsal stream connectivity

Prosodic boundary phenomena

Author
Publication venue
Publication date: 01/01/2023
Field of study

Synopsis: In spoken language comprehension, the hearer is faced with a more or less continuous stream of auditory information. Prosodic cues, such as pitch movement, pre-boundary lengthening, and pauses, incrementally help to organize the incoming stream of information into prosodic phrases, which often coincide with syntactic units. Prosody is hence central to spoken language comprehension and some models assume that the speaker produces prosody in a consistent and hierarchical fashion. While there is manifold empirical evidence that prosodic boundary cues are reliably and robustly produced and effectively guide spoken sentence comprehension across different populations and languages, the underlying mechanisms and the nature of the prosody-syntax interface still have not been identified sufficiently. This is also reflected in the fact that most models on sentence processing completely lack prosodic information. This edited book volume is grounded in a workshop that was held in 2021 at the annual conference of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS). The five chapters cover selected topics on the production and comprehension of prosodic cues in various populations and languages, all focusing in particular on processing of prosody at structurally relevant prosodic boundaries. Specifically, the book comprises cross-linguistic evidence as well as evidence from non-native listeners, infants, adults, and elderly speakers, highlighting the important role of prosody in both language production and comprehension