1,893 research outputs found

    Engaging the articulators enhances perception of concordant visible speech movements

    Full text link
    PURPOSE This study aimed to test whether (and how) somatosensory feedback signals from the vocal tract affect concurrent unimodal visual speech perception. METHOD Participants discriminated pairs of silent visual utterances of vowels under 3 experimental conditions: (a) normal (baseline) and while holding either (b) a bite block or (c) a lip tube in their mouths. To test the specificity of somatosensory-visual interactions during perception, we assessed discrimination of vowel contrasts optically distinguished based on their mandibular (English /ɛ/-/æ/) or labial (English /u/-French /u/) postures. In addition, we assessed perception of each contrast using dynamically articulating videos and static (single-frame) images of each gesture (at vowel midpoint). RESULTS Engaging the jaw selectively facilitated perception of the dynamic gestures optically distinct in terms of jaw height, whereas engaging the lips selectively facilitated perception of the dynamic gestures optically distinct in terms of their degree of lip compression and protrusion. Thus, participants perceived visible speech movements in relation to the configuration and shape of their own vocal tract (and possibly their ability to produce covert vowel production-like movements). In contrast, engaging the articulators had no effect when the speaking faces did not move, suggesting that the somatosensory inputs affected perception of time-varying kinematic information rather than changes in target (movement end point) mouth shapes. CONCLUSIONS These findings suggest that orofacial somatosensory inputs associated with speech production prime premotor and somatosensory brain regions involved in the sensorimotor control of speech, thereby facilitating perception of concordant visible speech movements. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.9911846R01 DC002852 - NIDCD NIH HHSAccepted manuscrip

    Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants

    Get PDF
    Objective: The present study investigated the development of audiovisual comprehension skills in prelingually deaf children who received cochlear implants. Design: We analyzed results obtained with the Common Phrases (Robbins et al., 1995) test of sentence comprehension from 80 prelingually deaf children with cochlear implants who were enrolled in a longitudinal study, from pre-implantation to 5 years after implantation. Results: The results revealed that prelingually deaf children with cochlear implants performed better under audiovisual (AV) presentation compared with auditory-alone (A-alone) or visual-alone (V-alone) conditions. AV sentence comprehension skills were found to be strongly correlated with several clinical outcome measures of speech perception, speech intelligibility, and language. Finally, pre-implantation V-alone performance on the Common Phrases test was strongly correlated with 3-year postimplantation performance on clinical outcome measures of speech perception, speech intelligibility, and language skills. Conclusions: The results suggest that lipreading skills and AV speech perception reflect a common source of variance associated with the development of phonological processing skills that is shared among a wide range of speech and language outcome measures

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    L2 Speech Learning: perception, production & training

    Get PDF
    Adult L2 learners have difficulties in perceiving and producing L2 speech sounds. In analyzing learners’ L2 speech learning problems, this study provides research data from a series of studies on L2 speech perception, production, and training. Section 1 investigates how the L1 sound system influences L2 speech perception. A recent study shows that phonetic differences and distances between English and Mandarin consonants predicted the perceptual problems of Mandarin consonants by native English learners of Chinese. Section 2 explores the relationship between L2 speech perception and production and reports a subsequent study on Mandarin consonants that shows English learners of Chinese performed better in perception than production on Mandarin retroflex sounds but vice versa on palatal sounds. The lack of alignment between perception and production suggests the relationship between L2 speech perception and production is not straightforward. In Section 3, two training experiments are reported and compared to explore the effects of phonetic training on the learning of English vowel and Mandarin tone contrasts

    The effect of listening tasks and motor responding on activation in The auditory cortex

    Get PDF
    Previous human functional magnetic resonance imaging (fMRI) research has shown that activation in the auditory cortex (AC) is strongly modulated by motor influences. Other fMRI studies have indicated that the AC is also modulated by attention-engaging listening tasks. How these motor- and task-related activation modulations relate to each other has, however, not been previously studied. The current understanding of the functional organization of the human AC is strongly based on primate models. However, some authors have recently questioned the correspondence between the monkey and human cognitive systems, and whether the monkey AC can be used as a model for the human AC. Further, it is unknown whether active listening modulates activations similarly in the human and nonhuman primate AC. Thus, non-human primate fMRI studies are important. Yet, such fMRI studies have been previously impeded by the difficulty in teaching tasks to non-human primates. The present thesis consists of three studies in which fMRI was used both to investigate the relationship between the effects related to active listening and motor responding in the human AC and to investigate task-related activation modulations in the monkey AC. Study I investigated the effect of manual responding on activation in the human AC during auditory and visual tasks, whereas Study II focused on the question whether auditory-motor effects interact with those related to active listening tasks in the AC and adjacent regions. In Study III, a novel paradigm was developed and used during fMRI to investigate auditory task-dependent modulations in the monkey AC. The results of Study I showed that activation in the AC in humans is strongly suppressed when subjects respond to targets using precision or power grips during both visual and auditory tasks. AC activation was also modulated by grip type during the auditory task but not during the visual task (with identical stimuli and motor responses). These manual-motor effects were distinct from general attention-related modulations revealed by comparing activation during auditory and visual tasks. Study II showed that activation in widespread regions in the AC and inferior parietal lobule (IPL) depends on whether subjects respond to target vowel pairs using vocal or manual responses. Furthermore, activation in the posterior AC and the IPL depends on whether subjects respond by overtly repeating the last vowel of a target pair or by producing a given response vowel. Discrimination tasks activated superior temporal gyrus (STG) regions more strongly than 2-back tasks, while the IPL was activated more strongly by 2-back tasks. These task-related (discrimination vs. 2-back) modulations were distinct from the response type effects in the AC. However, task and motor-response-type effects interacted in the IPL. Together the results of Studies I and II support the view that operations in the AC are shaped by its connections with motor cortical regions and that regions in the posterior AC are important in auditory-motor integration. Furthermore, these studies also suggest that the task, motor-response-type and vocal-response-type effects are caused by independent mechanisms in the AC. In Study III, a novel reward-cue paradigm was developed to teach macaque monkeys to perform an auditory task. Using this paradigm monkeys learned to perform an auditory task in a few weeks, whereas in previous studies auditory task training has required months or years of training. This new paradigm was then used during fMRI to measure activation in the monkey AC during active auditory task performance. The results showed that activation in the monkey AC is modulated during this task in a similar way as previously seen in human auditory attention studies. The findings of Study III provide an important step in bridging the gap between human and animal studies of the AC.Tidigare forskning med funktionell magnetresonanstomografi (fMRI) har visat att aktiveringen i hörselhjärnbarken hos människor är starkt påverkad av motoriken. Andra fMRI-studier visar att aktiveringen i hörselhjärnbarken också påverkas av uppgifter som kräver aktivt lyssnande. Man vet ändå inte hur dessa motoriska och uppgiftsrelaterade effekter hänger ihop. Den nuvarande uppfattningen om hörselhjärnbarkens funktionella struktur hos människan är starkt påverkad av primatmodeller. Däremot har en del forskare nyligen ifrågasatt om apors kognitiva system motsvarar människans, och specifikt huruvida apans hörselhjärnbark kan användas som modell för människans. Dessutom vet man inte om aktivt lyssnande påverkar aktivering i hörselhjärnbarken hos apor på samma sätt som hos människor. Därför är fMRI-studier på apor viktiga. Sådana fMRI-studier har emellertid tidigare hindrats av svårigheten att lära apor att göra uppgifter. Denna doktorsavhandling utgörs av tre studier där man använde fMRI för att undersöka hur effekter som är relaterade till aktivt lyssnande och motorik förhåller sig till varandra i hörselhjärnbarken hos människan och hur aktiva uppgifter påverkar aktiveringar i hörselhjärnbarken hos apor. I Studie I undersöktes hur aktiveringen i hörselhjärnbarken hos människan påverkades medan försökspersonerna utförde auditiva och visuella uppgifter och gav sina svar manuellt. Studie II fokuserade på huruvida audiomotoriska effekter och effekter relaterade till aktiva hörseluppgifter samspelade i hörselhjärnbarken och dess omnejd. I Studie III utvecklades ett nytt försöksparadigm som sedermera användes för att undersöka auditiva uppgiftsrelaterade aktiveringar i hörselhjärnbarken hos apor. Resultaten av Studie I visade att aktiveringen i hörselhjärnbarken dämpas starkt när försökspersonerna reagerar på målstimulus med precisions- och styrkegrepp både vid auditiva och visuella uppgifter. Aktivering i hörselhjärnbarken påverkas också av typen av grepp då försökspersonerna utför auditiva uppgifter men inte då de utför visuella uppgifter (med identiska stimuli och motoriska reaktioner). Dessa manuellt-motoriska effekter kunde särskiljas från allmänna uppmärksamhetsrelaterade effekter, vilka kom fram då man jämförde aktiveringen under auditiva och visuella uppgifter. Typen av motoriska reaktioner, dvs. hur försökspersonerna reagerade på målstimuli (genom att reagera med händerna eller att uttala ljud) påverkade aktiveringen i stora områden i hörselhjärnbarken och lobulus parietale inferior (IPL) i Studie II. Aktiveringen i den bakre delen av hörselhjärnbarken och IPL påverkades också av om försökspersonen upprepade målstimulusens sista vokal eller svarade genom att uttala en given responsvokal. Diskriminationsuppgifter aktiverade gyrus temporale superior mera än 2-back (minnes) -uppgifter, medan IPL aktiverades mera av 2-back -uppgifterna. Dessa uppgiftsrelaterade (diskrimination vs. 2-back) påverkningar var oberoende av effekter som hade att göra med reaktionstypen i hörselhjärnbarken. Däremot fanns det ett samspel mellan uppgift och motoriska effekter i IPL. Tillsammans stärker resultaten från Studie I och II uppfattningen att funktioner inom hörselhjärnbarken är starkt beroende av dess sammankoppling med den motoriska hjärnbarken, och att bakre delarna av hörselhjärnbarken är viktiga för audiomotorisk integration. Dessa studier visar därtill att uppgiftsrelaterade, motoriska och uttalsrelaterade effekter produceras av oberoende mekanismer i hörselhjärnbarken. I Studie III utvecklades ett nytt försöksparadigm som var baserat på belöningssignaler. Med detta försöksparadigm lärdes makakapor att utföra en auditiv uppgift. I Studie III lärde sig makakaporna uppgiften inom ett par veckor, medan inlärningen av auditiva uppgifter i tidigare studier har tagit upp till flera år. Detta paradigm användes sedan med hjälp av fMRI för att mäta aktivering inom hörselhjärnbarken hos apor, medan aporna utförde aktiva auditiva uppgifter. Resultaten visar att aktiveringen i hörselhjärnbarken hos apor påverkas av uppgifter på liknande sätt som man tidigare har visat i människoforskning. Fynden i Studie II är ett viktigt framsteg för att kunna överbygga gapet mellan människostudier och djurstudier gällande hörselhjärnbarken

    The Salience and Perceptual Weight of Secondary Acoustic Cues for Fricative Identification in Normal Hearing Adults

    Full text link
    The primary cue used by normal hearing individuals for identification of the fricatives /s/ and /ʃ/ is the most prominent spectrum of frication, which is discrete for this fricative contrast. Secondary cues that influence the identification and discrimination of these fricatives are context dependent. Specifically, the secondary cues that have been found to most significantly impact fricative perception include (a) the second formant transition onset and offset frequencies of a fricative-vowel pair, and (b) the amplitude of the spectral peak in the 2500Hz region of frication relative to an adjacent vowel’s peak amplitude in the same frequency region. However, the perceptual weight placed on each of these secondary cues remains unclear. Some research suggests that normal hearing individuals place equal weight on these secondary cues, while others posit that individuals have different cue preferences. In addition, salience of these secondary cues, which is dependent upon encoding of audibility, has yet to be assessed objectively in previous studies. The current study assessed the perceptual weight of these two secondary acoustic cues for the place of articulation fricative contrast /s/ vs. /ʃ/ while also objectively indexing the salience of each cue in normal hearing adults by utilizing a behavioral trading relations paradigm and an electrophysiological measure of acoustic change, respectively. Normal hearing adults were found to rely more heavily on the relative amplitude comparison cue relative to the formant frequency transition cue. Electrophysiological responses to secondary cues suggested that, for the most part, salience is driving the amplitude cue dominance

    Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

    Get PDF
    We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

    A Visionary Approach to Listening: Determining The Role Of Vision In Auditory Scene Analysis

    Get PDF
    To recognize and understand the auditory environment, the listener must first separate sounds that arise from different sources and capture each event. This process is known as auditory scene analysis. The aim of this thesis is to investigate whether and how visual information can influence auditory scene analysis. The thesis consists of four chapters. Firstly, I reviewed the literature to give a clear framework about the impact of visual information on the analysis of complex acoustic environments. In chapter II, I examined psychophysically whether temporal coherence between auditory and visual stimuli was sufficient to promote auditory stream segregation in a mixture. I have found that listeners were better able to report brief deviants in an amplitude modulated target stream when a visual stimulus changed in size in a temporally coherent manner than when the visual stream was coherent with the non-target auditory stream. This work demonstrates that temporal coherence between auditory and visual features can influence the way people analyse an auditory scene. In chapter III, the integration of auditory and visual features in auditory cortex was examined by recording neuronal responses in awake and anaesthetised ferret auditory cortex in response to the modified stimuli used in Chapter II. I demonstrated that temporal coherence between auditory and visual stimuli enhances the neural representation of a sound and influences which sound a neuron represents in a sound mixture. Visual stimuli elicited reliable changes in the phase of the local field potential which provides mechanistic insight into this finding. Together these findings provide evidence that early cross modal integration underlies the behavioural effects in chapter II. Finally, in chapter IV, I investigated whether training can influence the ability of listeners to utilize visual cues for auditory stream analysis and showed that this ability improved by training listeners to detect auditory-visual temporal coherence
    corecore