1,766 research outputs found

    A new technology on translating Indonesian spoken language into Indonesian sign language system

    Get PDF
    People with hearing disabilities are those who are unable to hear, resulted in their disability to communicate using spoken language. The solution offered in this research is by creating a one way translation technology to interpret spoken language to Indonesian sign language system (SIBI). The mechanism applied here is by catching the sentences (audio) spoken by common society to be converted to texts, by using speech recognition. The texts are then processed in text processing to select the input texts. The next stage is stemming the texts into prefixes, basic words, and suffixes. Each words are then being indexed and matched to SIBI. Afterwards, the system will arrange the words into SIBI sentences based on the original sentences, so that the people with hearing disabilities can get the information contained within the spoken language. This technology success rate were tested using Confusion Matrix, which resulted in precision value of 76%, accuracy value of 78%, and recall value of 79%. This technology has been tested in SMP-LB Karya Mulya on the 7th grader students with the total of 9 students. From the test, it is obtained that 86% of students stated that this technology runs very well

    Exploring the Pulse of Design and Music: The Impact of Visual Rhythm in Design for Auditorily Impaired Music Students

    Get PDF
    Auditorily impaired music students struggle to understand musical rhythms and patterns which can lead to frustration, low self-esteem, lack of developing musical skills, and reduced music appreciation. Often, the needs of those with hearing challenges are overlooked in the curriculum planning process. This research aims to show the importance of providing appropriate learning methods and materials for music students who are hearing impaired. Specifically, this study focuses on the elements of patterns, repetition of patterns, and rhythm within the visual designs used in the instruction of musical elements. The methods used in this research are based upon observation, investigation, and analysis of prior research and case studies using visual artifacts as learning tools. Moreover, the results of this research involve the revelation and importance of using adequate visual materials as supplemental learning materials for music students of all levels of abilities, specifically designed for those with special needs, hearing disorders, and auditory challenges

    A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

    Full text link
    Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language

    AR Comic Chat

    Get PDF
    Live speech transcription and captioning are important for the accessibility of deaf and hard of hearing individuals, especially in situations with no visible ASL translators. If live captioning is available at all, it is typically rendered in the style of closed captions on a display such as a phone screen or TV and away from the real conversation. This can potentially divide the focus of the viewer and detract from the experience. This paper proposes an investigation into an alternative, Augmented Reality driven approach to the display of these captions, using deep neural networks to compute, track and associate deep visual and speech descriptors in order to maintain captions as speech bubbles above the speaker

    LOUDER THAN WORDS: VOICING, SOUNDING, AND LISTENING TO DEAFNESS IN A QUIET PLACE

    Get PDF
    LOUDER THAN WORDS: VOICING, SOUNDING, AND LISTENING TO DEAFNESS IN A QUIET PLAC

    Word Importance Modeling to Enhance Captions Generated by Automatic Speech Recognition for Deaf and Hard of Hearing Users

    Get PDF
    People who are deaf or hard-of-hearing (DHH) benefit from sign-language interpreting or live-captioning (with a human transcriptionist), to access spoken information. However, such services are not legally required, affordable, nor available in many settings, e.g., impromptu small-group meetings in the workplace or online video content that has not been professionally captioned. As Automatic Speech Recognition (ASR) systems improve in accuracy and speed, it is natural to investigate the use of these systems to assist DHH users in a variety of tasks. But, ASR systems are still not perfect, especially in realistic conversational settings, leading to the issue of trust and acceptance of these systems from the DHH community. To overcome these challenges, our work focuses on: (1) building metrics for accurately evaluating the quality of automatic captioning systems, and (2) designing interventions for improving the usability of captions for DHH users. The first part of this dissertation describes our research on methods for identifying words that are important for understanding the meaning of a conversational turn within transcripts of spoken dialogue. Such knowledge about the relative importance of words in spoken messages can be used in evaluating ASR systems (in part 2 of this dissertation) or creating new applications for DHH users of captioned video (in part 3 of this dissertation). We found that models which consider both the acoustic properties of spoken words as well as text-based features (e.g., pre-trained word embeddings) are more effective at predicting the semantic importance of a word than models that utilize only one of these types of features. The second part of this dissertation describes studies to understand DHH users\u27 perception of the quality of ASR-generated captions; the goal of this work was to validate the design of automatic metrics for evaluating captions in real-time applications for these users. Such a metric could facilitate comparison of various ASR systems, for determining the suitability of specific ASR systems for supporting communication for DHH users. We designed experimental studies to elicit feedback on the quality of captions from DHH users, and we developed and evaluated automatic metrics for predicting the usability of automatically generated captions for these users. We found that metrics that consider the importance of each word in a text are more effective at predicting the usability of imperfect text captions than the traditional Word Error Rate (WER) metric. The final part of this dissertation describes research on importance-based highlighting of words in captions, as a way to enhance the usability of captions for DHH users. Similar to highlighting in static texts (e.g., textbooks or electronic documents), highlighting in captions involves changing the appearance of some texts in caption to enable readers to attend to the most important bits of information quickly. Despite the known benefits of highlighting in static texts, research on the usefulness of highlighting in captions for DHH users is largely unexplored. For this reason, we conducted experimental studies with DHH participants to understand the benefits of importance-based highlighting in captions, and their preference on different design configurations for highlighting in captions. We found that DHH users subjectively preferred highlighting in captions, and they reported higher readability and understandability scores and lower task-load scores when viewing videos with captions containing highlighting compared to the videos without highlighting. Further, in partial contrast to recommendations in prior research on highlighting in static texts (which had not been based on experimental studies with DHH users), we found that DHH participants preferred boldface, word-level, non-repeating highlighting in captions

    Space and time in the human brain

    Get PDF

    Audiovisual speech perception in cochlear implant patients

    Get PDF
    Hearing with a cochlear implant (CI) is very different compared to a normal-hearing (NH) experience, as the CI can only provide limited auditory input. Nevertheless, the central auditory system is capable of learning how to interpret such limited auditory input such that it can extract meaningful information within a few months after implant switch-on. The capacity of the auditory cortex to adapt to new auditory stimuli is an example of intra-modal plasticity — changes within a sensory cortical region as a result of altered statistics of the respective sensory input. However, hearing deprivation before implantation and restoration of hearing capacities after implantation can also induce cross-modal plasticity — changes within a sensory cortical region as a result of altered statistics of a different sensory input. Thereby, a preserved cortical region can, for example, support a deprived cortical region, as in the case of CI users which have been shown to exhibit cross-modal visual-cortex activation for purely auditory stimuli. Before implantation, during the period of hearing deprivation, CI users typically rely on additional visual cues like lip-movements for understanding speech. Therefore, it has been suggested that CI users show a pronounced binding of the auditory and visual systems, which may allow them to integrate auditory and visual speech information more efficiently. The projects included in this thesis investigate auditory, and particularly audiovisual speech processing in CI users. Four event-related potential (ERP) studies approach the matter from different perspectives, each with a distinct focus. The first project investigates how audiovisually presented syllables are processed by CI users with bilateral hearing loss compared to NH controls. Previous ERP studies employing non-linguistic stimuli and studies using different neuroimaging techniques found distinct audiovisual interactions in CI users. However, the precise timecourse of cross-modal visual-cortex recruitment and enhanced audiovisual interaction for speech related stimuli is unknown. With our ERP study we fill this gap, and we present differences in the timecourse of audiovisual interactions as well as in cortical source configurations between CI users and NH controls. The second study focuses on auditory processing in single-sided deaf (SSD) CI users. SSD CI patients experience a maximally asymmetric hearing condition, as they have a CI on one ear and a contralateral NH ear. Despite the intact ear, several behavioural studies have demonstrated a variety of beneficial effects of restoring binaural hearing, but there are only few ERP studies which investigate auditory processing in SSD CI users. Our study investigates whether the side of implantation affects auditory processing and whether auditory processing via the NH ear of SSD CI users works similarly as in NH controls. Given the distinct hearing conditions of SSD CI users, the question arises whether there are any quantifiable differences between CI user with unilateral hearing loss and bilateral hearing loss. In general, ERP studies on SSD CI users are rather scarce, and there is no study on audiovisual processing in particular. Furthermore, there are no reports on lip-reading abilities of SSD CI users. To this end, in the third project we extend the first study by including SSD CI users as a third experimental group. The study discusses both differences and similarities between CI users with bilateral hearing loss and CI users with unilateral hearing loss as well as NH controls and provides — for the first time — insights into audiovisual interactions in SSD CI users. The fourth project investigates the influence of background noise on audiovisual interactions in CI users and whether a noise-reduction algorithm can modulate these interactions. It is known that in environments with competing background noise listeners generally rely more strongly on visual cues for understanding speech and that such situations are particularly difficult for CI users. As shown in previous auditory behavioural studies, the recently introduced noise-reduction algorithm "ForwardFocus" can be a useful aid in such cases. However, the questions whether employing the algorithm is beneficial in audiovisual conditions as well and whether using the algorithm has a measurable effect on cortical processing have not been investigated yet. In this ERP study, we address these questions with an auditory and audiovisual syllable discrimination task. Taken together, the projects included in this thesis contribute to a better understanding of auditory and especially audiovisual speech processing in CI users, revealing distinct processing strategies employed to overcome the limited input provided by a CI. The results have clinical implications, as they suggest that clinical hearing assessments, which are currently purely auditory, should be extended to audiovisual assessments. Furthermore, they imply that rehabilitation including audiovisual training methods may be beneficial for all CI user groups for quickly achieving the most effective CI implantation outcome

    Aging and spatial abilities : age-related impact on users of a sign language

    Full text link
    Introduction. Les fonctions cognitives Ă©voluent avec l’ñge : certaines tendent Ă  diminuer dans leur efficacitĂ© alors que d’autres se maintiennent. Des recherches ont montrĂ© que le vieillissement affecte la rotation mentale, la perception spatiale, la visualisation spatiale et la prise de perspective. Des facteurs sociodĂ©mographiques et comportementaux peuvent aussi influencer le cheminement du vieillissement cognitif des personnes ĂągĂ©es. À titre d’exemple, l'expĂ©rience langagiĂšre, comme le bilinguisme, agit comme un facteur neuroprotecteur contribuant Ă  la rĂ©serve cognitive. L’impact de l’utilisation d’une langue des signes sur la cognition spatiale a suscitĂ© beaucoup d’intĂ©rĂȘt chez les chercheurs s’intĂ©ressant aux langues des signes. Pourtant, aucune recherche n’a encore abordĂ© l’effet de l’utilisation Ă  long terme d’une langue des signes sur la cognition spatiale des signeurs aĂźnĂ©s. Objectif. Le but de cette thĂšse est d’examiner s’il existe des diffĂ©rences sur le plan des habiletĂ©s spatiales entre signeurs (sourds et entendants) et non-signeurs de diffĂ©rents groupes d’ñge. Plus prĂ©cisĂ©ment, cette thĂšse a examinĂ© i) si la performance Ă  des tĂąches d’habiletĂ©s spatiales diffĂšre selon l’ñge (jeunes adultes/aĂźnĂ©s) et l’expĂ©rience linguistique (signeurs sourds/entendants signeurs/entendants non-signeurs) et ii) si la performance diffĂšre selon la sous-composante d’habiletĂ©s spatiales ciblĂ©e (perception spatiale; visualisation spatiale; rotation mentale; prise de perspective). MĂ©thode. Pour investiguer l’effet de l’ñge et de l’expĂ©rience linguistique sur les habiletĂ©s spatiales, une collecte de donnĂ©es auprĂšs de 120 participants a Ă©tĂ© effectuĂ©e : 60 adultes ĂągĂ©s de 64 Ă  80 ans (20 sourds signeurs, 20 entendants signeurs, 20 entendants non-signeurs) et 60 jeunes adultes de 18 Ă  35 ans (20 sourds signeurs, 20 entendants signeurs, 20 entendants non-signeurs). Afin de s’assurer de l’admissibilitĂ© des participants, une Ă©valuation de l’acuitĂ© visuelle, de l’acuitĂ© auditive, des compĂ©tences langagiĂšres (français et langue des signes quĂ©bĂ©coise), de la santĂ© cognitive et de l’intelligence a Ă©tĂ© effectuĂ©e. Les participants ont Ă©tĂ© appariĂ©s entre groupes d’expĂ©rience linguistique selon leur niveau d’éducation et d’intelligence. Les quatre sous-composantes d’habiletĂ©s spatiales ciblĂ©es (perception spatiale; visualisation spatiale; rotation mentale; prise de perspective) ont Ă©tĂ© testĂ©es par l’entremise d’une batterie de sept tests psychomĂ©triques. RĂ©sultats. ConformĂ©ment Ă  ce qui a Ă©tĂ© prĂ©cĂ©demment observĂ© sur l’effet de l’ñge sur les habiletĂ©s spatiales, les rĂ©sultats en termes de justesse de la rĂ©ponse ont rĂ©vĂ©lĂ© que les jeunes signeurs sourds obtiennent globalement de meilleurs rĂ©sultats que les signeurs sourds aĂźnĂ©s dans toutes les tĂąches d’habiletĂ©s spatiales. De plus, les rĂ©sultats ont montrĂ© un avantage des entendants signeurs sur les entendants non-signeurs aux tĂąches de rotation mentale et de prise de perspective, quel que soit leur Ăąge. Un avantage gĂ©nĂ©ral des signeurs aĂźnĂ©s (sourds et entendants) par rapport aux non-signeurs aĂźnĂ©s a Ă©tĂ© observĂ© uniquement pour les tĂąches de visualisation spatiale en termes de justesse de la rĂ©ponse. Ces rĂ©sultats suggĂšrent que les changements cognitifs associĂ©s au vieillissement ont un effet sur le traitement de l’information spatiale quelle que soit la modalitĂ© linguistique utilisĂ©e et que l’effet de l’utilisation de la langue des signes sur les processus spatiaux semblent diffĂ©rer entre les signeurs sourds et les signeurs entendants. Discussion. Cette recherche transversale a permis d’étudier pour la premiĂšre fois l’impact du vieillissement sur les habiletĂ©s spatiales des utilisateurs d’une langue des signes. Également, elle explore le facteur potentiellement attĂ©nuant de l’utilisation de la langue des signes quant aux effets de l’ñge sur la performance Ă  des tĂąches d’habiletĂ©s spatiales. Sur la base des rĂ©sultats, il est proposĂ© que l’effet de l’utilisation d’une langue des signes sur la cognition spatiale est spĂ©cifique aux sous-domaines d’habiletĂ©s spatiales (perception spatiale; visualisation spatiale; rotation mentale; prise de perspective), et que l’expĂ©rience linguistique, telle que le bilinguisme bimodal, est un facteur d’intĂ©rĂȘt dans la relation entre l’utilisation d’une langue des signes et les processus spatiaux. Conclusion. Les rĂ©sultats rapportĂ©s dans la prĂ©sente thĂšse seront utiles aux futurs chercheurs intĂ©ressĂ©s par l’étude de la cognition chez les aĂźnĂ©s signeurs. Des recherches futures devraient se poursuivre dans cette direction afin de prĂ©ciser l’impact du bilinguisme bimodal sur la cognition spatiale Ă  la lumiĂšre de ce qui est connu des effets protecteurs du bilinguisme unimodal face au vieillissement. De plus, les recherches futures devraient envisager d’élargir la perspective de l’effet de l’ñge sur les habiletĂ©s spatiales des signeurs, en tenant compte des donnĂ©es cognitives et linguistiques. Ces recherches pourraient investiguer la cause de la distinction dans le traitement d’informations spatiales sur la production et la comprĂ©hension d’une langue des signes.Introduction. Across the adult lifespan, cognitive abilities change: some tend to decrease with age whereas others are maintained. The results of previous studies have shown that performance on tasks spatial perception, spatial visualization, mental rotation and perspective taking are poorer in older adults than in younger adults. Sociodemographic and behavioral factors may influence the cognitive aging trajectories of older adults. For example, language experience, such as bilingualism, may be a neuroprotective factor contributing to the cognitive reserve. The impact of language experience in another modality, as it is the case for visual-spatial language, on spatial cognition has generated much interest. To date, no research has addressed this issue with regards of the potential effect of longtime use of sign language on the spatial cognition of older signers. Aim. The aim of this thesis is to investigate whether there are differences in spatial abilities among signers (deaf and hearing) and non-signers of different age groups. More specifically, this thesis examined i) if performance on tasks of spatial abilities differs according to age (younger/older) and linguistic experience (deaf signers/hearing signers/hearing non-signers) and ii) if performance differs according to the type of spatial abilities subcomponent targeted (spatial perception; spatial visualization; mental rotation; perspective taking). Methods. To examine the effect of age and linguistic experience on spatial abilities, data were collected from 120 participants: 60 older adults from 65 to 80 years of age (20 deaf signers, 20 hearing signers, 20 hearing non-signers) and 60 young adults ranging in age from 18 to 35 years (20 deaf signers, 20 hearing signers, 20 hearing non-signers). Prior to the experiment, participants were tested for visual and hearing acuity, language proficiency (Quebec Sign Language and French), cognitive health and intelligence. Based on their linguistic experience, the participants were matched on the basis of their educational level as well as their level of intelligence. The four subcomponents of spatial abilities were tested using a battery of seven tests. Results. Consistent with previously published data on the effect of age on spatial abilities, accuracy results revealed that the younger deaf signers constantly performed better than the older deaf signers on all tasks. Results also highlighted a specific advantage of hearing signers over hearing non-signers in terms of accuracy on mental rotation and perspective taking tasks regardless of age. A general advantage of older signers (deaf and hearing) over older non-signers was observed on spatial visualization tasks only. These results suggest that age-related cognitive changes impact the processing of spatial information regardless of the linguistic modality used. Also, the effect of sign language use on spatial processes may differ between deaf signers and hearing signers. Discussion. This cross-sectional research made it possible to investigate for the first time the impact of aging on spatial abilities among sign language users, as well as to explore the potential effect of sign language use with regards to performance on tasks of spatial abilities in an older population. Based on the results, it is proposed that the effect of sign language use is subdomain specific and that language experience such as bimodal bilingualism is a factor of interest in the relation between sign language use and spatial processing. Conclusion. The results reported in the present thesis will be helpful to future researchers interested in investigating aspects of cognition throughout the lifespan of older signers. Future research should be pursued in order to investigate the impact of bimodal bilingualism on spatial cognition in the light of the aging factor. In addition, future research should consider broadening the scope of this research area by examining in detail the interaction between cognitive skills and linguistic modality. Researches could address the effect of the distinction observed between deaf signers and hearing signers in terms of spatial processing and investigate links between spatial processing and sign language production and comprehension
    • 

    corecore