698 research outputs found
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Body language (BL) refers to the non-verbal communication expressed through
physical movements, gestures, facial expressions, and postures. It is a form of
communication that conveys information, emotions, attitudes, and intentions
without the use of spoken or written words. It plays a crucial role in
interpersonal interactions and can complement or even override verbal
communication. Deep multi-modal learning techniques have shown promise in
understanding and analyzing these diverse aspects of BL. The survey emphasizes
their applications to BL generation and recognition. Several common BLs are
considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and
Talking Head (TH), and we have conducted an analysis and established the
connections among these four BL for the first time. Their generation and
recognition often involve multi-modal approaches. Benchmark datasets for BL
research are well collected and organized, along with the evaluation of SOTA
methods on these datasets. The survey highlights challenges such as limited
labeled data, multi-modal learning, and the need for domain adaptation to
generalize models to unseen speakers or languages. Future research directions
are presented, including exploring self-supervised learning techniques,
integrating contextual information from other modalities, and exploiting
large-scale pre-trained multi-modal models. In summary, this survey paper
provides a comprehensive understanding of deep multi-modal learning for various
BL generations and recognitions for the first time. By analyzing advancements,
challenges, and future directions, it serves as a valuable resource for
researchers and practitioners in advancing this field. n addition, we maintain
a continuously updated paper list for deep multi-modal learning for BL
recognition and generation: https://github.com/wentaoL86/awesome-body-language
Recommended from our members
Investigating the role of phonological awareness on phonological recoding during reading in deaf children
This study uses eye-tracking to investigate the role of phonological awareness on phonological recoding during reading in deaf and hard-of-hearing (DHH) children who predominantly use sign language as compared to typically hearing children. Phonological recoding is one of the earliest strategies employed in reading, in which the reader maps each grapheme directly to the corresponding speech sound of the language (Jared, Levy, Ashby, and Agauas, 2015). Many DHH children struggle with reading, and the severity of the delays in some children increase with age. Although there are a few studies examining the eye-patterns during reading in DHH adults, there are considerably fewer studies examining phonological recoding and the role of phonological awareness during reading in DHH children (Belanger, Baum, and Mayberry 2011; Belanger, Rayner, and Mayberry, 2013). This study will be testing influence of the visual language signal on reading in deaf children. I compare phonological awareness skills of English, ASL, and mouthing gestures to reading fluency, measured via eye-movement patterns when reading a sequence of sentences an eye-tracker. Sentences are manipulated to target phonological recoding during reading by altering target words embedded in the sentence in three experimental conditions: no change, homophone foil, and spelling control (Jared et al. 2015). Preliminary results indicate that deaf signers are proficient readers and seemingly rely on ASL skills to read. In addition, I suggest that deaf signers do not participate in phonological recoding.Linguistic
The role of phonology in visual word recognition: evidence from Chinese
Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin
Interactive effects of orthography and semantics in Chinese picture naming
Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin
Sign Language Recognition
This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief introduction to the motivations and requirements, followed by a précis of sign linguistics and their impact on the field. The types of data available and the relative merits are explored allowing examination of the features which can be extracted. Classifying the manual aspects of sign (similar to gestures) is then discussed from a tracking and non-tracking viewpoint before summarising some of the approaches to the non-manual aspects of sign languages. Methods for combining the sign classification results into full SLR are given showing the progression towards speech recognition techniques and the further adaptations required for the sign specific case. Finally the current frontiers are discussed and the recent research presented. This covers the task of continuous sign recognition, the work towards true signer independence, how to effectively combine the different modalities of sign, making use of the current linguistic research and adapting to larger more noisy data set
Sensorimotor Modulations by Cognitive Processes During Accurate Speech Discrimination: An EEG Investigation of Dorsal Stream Processing
Internal models mediate the transmission of information between anterior and posterior regions of the dorsal stream in support of speech perception, though it remains unclear how this mechanism responds to cognitive processes in service of task demands. The purpose of the current study was to identify the influences of attention and working memory on sensorimotor activity across the dorsal stream during speech discrimination, with set size and signal clarity employed to modulate stimulus predictability and the time course of increased task demands, respectively. Independent Component Analysis of 64–channel EEG data identified bilateral sensorimotor mu and auditory alpha components from a cohort of 42 participants, indexing activity from anterior (mu) and posterior (auditory) aspects of the dorsal stream. Time frequency (ERSP) analysis evaluated task-related changes in focal activation patterns with phase coherence measures employed to track patterns of information flow across the dorsal stream. ERSP decomposition of mu clusters revealed event-related desynchronization (ERD) in beta and alpha bands, which were interpreted as evidence of forward (beta) and inverse (alpha) internal modeling across the time course of perception events. Stronger pre-stimulus mu alpha ERD in small set discrimination tasks was interpreted as more efficient attentional allocation due to the reduced sensory search space enabled by predictable stimuli. Mu-alpha and mu-beta ERD in peri- and post-stimulus periods were interpreted within the framework of Analysis by Synthesis as evidence of working memory activity for stimulus processing and maintenance, with weaker activity in degraded conditions suggesting that covert rehearsal mechanisms are sensitive to the quality of the stimulus being retained in working memory. Similar ERSP patterns across conditions despite the differences in stimulus predictability and clarity, suggest that subjects may have adapted to tasks. In light of this, future studies of sensorimotor processing should consider the ecological validity of the tasks employed, as well as the larger cognitive environment in which tasks are performed. The absence of interpretable patterns of mu-auditory coherence modulation across the time course of speech discrimination highlights the need for more sensitive analyses to probe dorsal stream connectivity
Prosodic boundary phenomena
Synopsis:
In spoken language comprehension, the hearer is faced with a more or less continuous stream of auditory information. Prosodic cues, such as pitch movement, pre-boundary lengthening, and pauses, incrementally help to organize the incoming stream of information into prosodic phrases, which often coincide with syntactic units. Prosody is hence central to spoken language comprehension and some models assume that the speaker produces prosody in a consistent and hierarchical fashion. While there is manifold empirical evidence that prosodic boundary cues are reliably and robustly produced and effectively guide spoken sentence comprehension across different populations and languages, the underlying mechanisms and the nature of the prosody-syntax interface still have not been identified sufficiently. This is also reflected in the fact that most models on sentence processing completely lack prosodic information.
This edited book volume is grounded in a workshop that was held in 2021 at the annual conference of the Deutsche Gesellschaft fĂĽr Sprachwissenschaft (DGfS). The five chapters cover selected topics on the production and comprehension of prosodic cues in various populations and languages, all focusing in particular on processing of prosody at structurally relevant prosodic boundaries. Specifically, the book comprises cross-linguistic evidence as well as evidence from non-native listeners, infants, adults, and elderly speakers, highlighting the important role of prosody in both language production and comprehension
- …