1,616 research outputs found

    Multimodal Based Audio-Visual Speech Recognition for Hard-of-Hearing: State of the Art Techniques and Challenges

    Get PDF
    Multimodal Integration (MI) is the study of merging the knowledge acquired by the nervous system using sensory modalities such as speech, vision, touch, and gesture. The applications of MI expand over the areas of Audio-Visual Speech Recognition (AVSR), Sign Language Recognition (SLR), Emotion Recognition (ER), Bio Metrics Applications (BMA), Affect Recognition (AR), Multimedia Retrieval (MR), etc. The fusion of modalities such as hand gestures- facial, lip- hand position, etc., are mainly used sensory modalities for the development of hearing-impaired multimodal systems. This paper encapsulates an overview of multimodal systems available within literature towards hearing impaired studies. This paper also discusses some of the studies related to hearing-impaired acoustic analysis. It is observed that very less algorithms have been developed for hearing impaired AVSR as compared to normal hearing. Thus, the study of audio-visual based speech recognition systems for the hearing impaired is highly demanded for the people who are trying to communicate with natively speaking languages.  This paper also highlights the state-of-the-art techniques in AVSR and the challenges faced by the researchers for the development of AVSR systems

    Attention and empirical studies of grammar

    Get PDF
    How is the generation of a grammatical sentence implemented by the human brain? A starting place for such an inquiry lies in linguistic theory. Unfortunately, linguistic theories illuminate only abstract knowledge representations and do not indicate how these representations interact with cognitive architecture to produce discourse. We examine tightly constrained empirical methods to study how grammar interacts with one part of the cognitive architecture, namely attention. Finally, we show that understanding attention as a neural network can link grammatical choice to underlying brain systems. Overall, our commentary supports a multilevel empirical approach that clarifies and expands the connections between cognitive science and linguistics thus advancing the interdisciplinary agenda outlined by Jackendoff

    Augmented Reality Talking Heads as a Support for Speech Perception and Production

    Get PDF

    The Meaning of Movement: Using Motion Design to Enrich Words for Deaf and Hard of Hearing Children

    Get PDF
    This thesis aims to address challenging areas of vocabulary for deaf and hard of hearing children by developing an open resource for students, parents, teachers, and content creators that utilizes motion to enhance written words for deaf and hard of hearing children. This research seeks to study the means of nonverbal communication such as body language expression and paralinguistic prosody (i.e., tone, intonation, volume, and pitch) qualities within the framework of graphic design through motion design. Body movement and expression are essential during face-to-face communication, but written language lacks such context clues. Additionally, the hard of hearing may not fully detect prosody within their range of hearing. This lack of information gathered through body language and paralanguage can be replicated with animated movement, which adds greater context to otherwise static text, enhancing insight into the meaning or use of a word. Seeing how a word in written form correlates with enhanced meaning through movement provides greater understanding and retention. This enhancement promotes improved communication in the world through graphic design. Motion design, specifically kinetic typography, offers a promising tool to help aid with language learning for continued exploration and development

    Annotated Bibliography: Anticipation

    Get PDF

    Interactivity in video-based models

    Get PDF
    In this review we argue that interactivity can be effective in video-based models to engage learners in relevant cognitive processes. We do not treat modeling as an isolated instructional method but adopted the social cognitive model of sequential skill acquisition in which learners start with observation and finish with independent, self-regulated performance. Moreover, we concur with the notion that interactivity should emphasize the cognitive processes that learners engage in when they interact with the learning environment. The four-component instructional design (4C/ID) model is used to define a set of cognitive processes: Elaboration and induction enable learners to construct schemas, whereas compilation and strengthening enable learners to automate these schemas. Pacing, cues, control over appearance, prediction, working in dyads, personalized task selection, and reflection prompts are identified as guidelines that might support learners to interactively construct schemas. Personalized task selection with part-task practice helps learners to interactively automate schemas

    Interactivity in Video-based Models

    Get PDF

    The role of phonology in visual word recognition: evidence from Chinese

    Get PDF
    Posters - Letter/Word Processing V: abstract no. 5024The hypothesis of bidirectional coupling of orthography and phonology predicts that phonology plays a role in visual word recognition, as observed in the effects of feedforward and feedback spelling to sound consistency on lexical decision. However, because orthography and phonology are closely related in alphabetic languages (homophones in alphabetic languages are usually orthographically similar), it is difficult to exclude an influence of orthography on phonological effects in visual word recognition. Chinese languages contain many written homophones that are orthographically dissimilar, allowing a test of the claim that phonological effects can be independent of orthographic similarity. We report a study of visual word recognition in Chinese based on a mega-analysis of lexical decision performance with 500 characters. The results from multiple regression analyses, after controlling for orthographic frequency, stroke number, and radical frequency, showed main effects of feedforward and feedback consistency, as well as interactions between these variables and phonological frequency and number of homophones. Implications of these results for resonance models of visual word recognition are discussed.postprin

    Interactive effects of orthography and semantics in Chinese picture naming

    Get PDF
    Posters - Language Production/Writing: abstract no. 4035Picture-naming performance in English and Dutch is enhanced by presentation of a word that is similar in form to the picture name. However, it is unclear whether facilitation has an orthographic or a phonological locus. We investigated the loci of the facilitation effect in Cantonese Chinese speakers by manipulating—at three SOAs (2100, 0, and 1100 msec)—semantic, orthographic, and phonological similarity. We identified an effect of orthographic facilitation that was independent of and larger than phonological facilitation across all SOAs. Semantic interference was also found at SOAs of 2100 and 0 msec. Critically, an interaction of semantics and orthography was observed at an SOA of 1100 msec. This interaction suggests that independent effects of orthographic facilitation on picture naming are located either at the level of semantic processing or at the lemma level and are not due to the activation of picture name segments at the level of phonological retrieval.postprin

    Phonic Faces as a Method for Improving Decoding for Children with Persistent Decoding Deficits

    Get PDF
    Background: Decoding is a foundational skill for reading, contributing to both reading fluency and comprehension (Lyon et al., 2003). Visual enhancements of alphabetic letters such as shaping letters to resemble words beginning with that sound (e.g., “f” drawn as a flower) (Handler & Fierson, 2011) and associating photographs of lips producing the sounds (Lindamood & Lindamood, 1998) have been shown to improve decoding skills. This study investigated whether a more direct pictured association using faces with alphabet letters placed in the mouth to cue speech sounds, termed Phonic Faces (Norris,2001), would enable students with persistent decoding impairment to acquire orthographic patterns in pseudowords, real words, and reading passages. Methods: A multiple baseline single subject design assessed the effects of Phonic Faces on learning to decode two orthographic patterns. Three participants were taught the short vowel CVC pattern for five weeks using words and pseudowords displayed using Phonic Faces while two long-vowel patterns (CVCe and CVVC) remained in an untrained baseline condition. On week six, a five-week intervention was introduced for the long vowel pattern showing the lowest scores on daily pseudoword probes. Results: The results of the study were suggestive but not conclusive. The graphs of daily probe scores for all three subjects showed significant gains for all three patterns using the two standard deviation method of analysis. However, in all three cases, one or more of the control variables made changes prior to the introduction of treatment. Additionally, pre-to-posttest gains in measures of decoding and contextualized reading showed scores greater than the SEM, indicating true gains. Discussion: Analysis of patterns of change showed generalization of learning across patterns. Once the long vowel Phonic Faces were introduced, improvements were shown for both long vowel patterns. Likewise, the long and short vowels were embedded in similar patterns of 2-3 letter consonant blends and digraphs, all of which scored at low levels at pretest. However, once the consonant patterns were learned in the CVC words, they generalized quickly to long vowel words, especially for participants who scored higher on vowel knowledge at pretest. Replication with decoders exhibiting greater impairment is recommended
    corecore