1,852 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Adjusting for speaking rate when perceiving speech in background noise.

    Get PDF
    Speech perception is a very relevant concept occurring every day. Acoustic context effects such as temporal contrast effects (TCEs) influence perception significantly. For instance, when a faster context sentence is spoken, the participant should perceive the following target word as slower and more like /t/ in “tier”; when a slower context sentence is spoken, the participant should perceive the following target sound as faster and more like /d/ in “deer”. Recent work by Bosker et al. (2020) concluded that selective attention (directing attention to a specific stimulus while ignoring surrounding stimuli) had no effect on TCEs, suggesting they were automatic and low-level. However, their paradigm was not an ideal test; the voices heard contained different talkers with one presented to each ear, making them easy to perceptually separate. Here, the paradigm was designed to eliminate talker variability (acoustic variability among talkers) by using the same male talker speaking one sentence to both ears, two sentences simultaneously to both ears (diotically) or one to each ear (dichotically). Two experiments tested these effects of presentation mode on TCEs. In each experiment, TCE magnitudes were similar across presentation modes. These results are consistent with Bosker et al.’s (2020) claims of TCEs being automatic and low-level. Potential neural mechanisms contributing to TCEs are discussed

    Windows into Sensory Integration and Rates in Language Processing: Insights from Signed and Spoken Languages

    Get PDF
    This dissertation explores the hypothesis that language processing proceeds in "windows" that correspond to representational units, where sensory signals are integrated according to time-scales that correspond to the rate of the input. To investigate universal mechanisms, a comparison of signed and spoken languages is necessary. Underlying the seemingly effortless process of language comprehension is the perceiver's knowledge about the rate at which linguistic form and meaning unfold in time and the ability to adapt to variations in the input. The vast body of work in this area has focused on speech perception, where the goal is to determine how linguistic information is recovered from acoustic signals. Testing some of these theories in the visual processing of American Sign Language (ASL) provides a unique opportunity to better understand how sign languages are processed and which aspects of speech perception models are in fact about language perception across modalities. The first part of the dissertation presents three psychophysical experiments investigating temporal integration windows in sign language perception by testing the intelligibility of locally time-reversed sentences. The findings demonstrate the contribution of modality for the time-scales of these windows, where signing is successively integrated over longer durations (~ 250-300 ms) than in speech (~ 50-60 ms), while also pointing to modality-independent mechanisms, where integration occurs in durations that correspond to the size of linguistic units. The second part of the dissertation focuses on production rates in sentences taken from natural conversations of English, Korean, and ASL. Data from word, sign, morpheme, and syllable rates suggest that while the rate of words and signs can vary from language to language, the relationship between the rate of syllables and morphemes is relatively consistent among these typologically diverse languages. The results from rates in ASL also complement the findings in perception experiments by confirming that time-scales at which phonological units fluctuate in production match the temporal integration windows in perception. These results are consistent with the hypothesis that there are modality-independent time pressures for language processing, and discussions provide a synthesis of converging findings from other domains of research and propose ideas for future investigations

    Cortical auditory processing of informational masking effects by target-masker similarity and stimulus uncertainty

    Get PDF
    Purpose: Understanding speech in a background of other people talking is one of the most difficult listening challenges for hearing-impaired individuals, and even for those with normal hearing. Speech-on-speech masking, is known to contribute to increased perceptual difficulty over non-speech background noise because of informational masking provided over and above the energetic masking effect. While informational masking research has identified factors of similarity and uncertainty between target and masker that contribute to reduced behavioral performance in speech background noise, critical gaps in knowledge including the underlying neural-perceptual processes remain. By systematically manipulating aspects of similarity and uncertainty in the same auditory paradigm, the current study proposed to examine the time course and objectively quantify these informational masking effects at both early and late stages of auditory processing using auditory evoked potentials (AEPs) in a two-factor repeated measures paradigm. Method: Thirty participants were included in this cross sectional repeated measures design. Target-masker similarity between target and masker were manipulated by varying the linguistic/phonetic similarity (i.e. language) of the talkers in the noise maskers. Specifically, four levels representing hypothesized increasing levels of informational masking were implemented: (1) No masker (quiet), (2) Mandarin (linguistically and phonetically dissimilar), (3) Dutch (linguistically dissimilar, but phonetically similar), and (4) English (linguistically and phonetically similar). Stimulus uncertainty was manipulated by task complexity, specifically target-to-target interval (TTI) of an auditory paradigm. Participants had to discriminate between English word stimuli (/bæt/ and /pæt/) presented in an oddball paradigm in each masker condition at +3 dB SNR by pressing buttons to either the target or standard stimulus (pseudo-randomized between /bæt/ and /pæt/ for all participants). Responses were recorded simultaneously for P1-N1-P2 (standard waveform) and P3 (target waveform). This design allowed for simultaneous recording of multiple AEP peaks, including analysis of amplitude, area, and latency characteristics, as well as accuracy, reaction time, and d’ behavioral discrimination to button press responses. Finally, AEP measurers were compared to performance on a behavioral word recognition task (NU-6 25-word lists) in the proposed language maskers and at multiple signal-to-noise ratios (SNRs) to further explore if AEP components of amplitude/area and latency are correlated to behavioral outcomes across proposed maskers. Results: Several trends in AEP and behavioral outcomes were consistent with the hypothesized hierarchy of increasing linguistic/phonetic similarity from Mandarin to Dutch to English, but not all differences were significant. The most supported findings for this factor were that all babble maskers significantly affected outcomes compared to quiet, and that the native language English masker had the largest effect on outcomes in the AEP paradigm, including N1 amplitude, P3 amplitude and area, as well as decreased reaction time, accuracy, and d’ behavioral discrimination to target word responses. AEP outcomes for the Mandarin and Dutch maskers, however, were not significantly different across all measured components. Outcomes for AEP latencies for both N1 and P3 also supported an effect of stimulus uncertainty, consistent with a hypothesized increase in processing time related to increased task complexity when target stimulus timing was randomized. In addition, this effect was stronger, as evidenced by larger effect sizes, at the P3 level of auditory processing compared to the N1. An unanticipated result was the absence of the expected additive effect between linguistic/phonetic similarity and stimulus uncertainty. Finally, trends in behavioral word recognition performance were generally consistent with those observed for AEP component measures such that no differences between Dutch and Mandarin maskers were found, but the English masker yielded the lowest percent correct scores. Furthermore, correlations between behavioral word recognition and AEP component measures yielded some moderate correlations, but no common AEP components accounted for a majority of variance for behavioral word recognition. Conclusions: The results of this study add to our understanding of auditory perception in informational masking in four ways. First, observable effects of both similarity and uncertainty were evidenced at both early and late levels of auditory cortical processing. This supports the use of AEPs to better understand the informational masking deficit by providing a window into the auditory pathway. Second, stronger effects were found for P3 response, an active, top-down level of auditory processing providing some suggestion that while informational masking degradation happens at lower levels, higher level active auditory processing is more sensitive to informational masking deficits. Third, the lack of interaction of main effects leads us to a linear interpretation of the interaction of similarity and uncertainty with an equal effect across listening conditions. Fourth, even though there were few and only moderate correlations to behavioral word recognition, AEP and behavioral performance data followed the same trends as AEP measures across similarity. Through both auditory neural and behavioral testing, language maskers degraded AEPs and reduced word recognition, but particularly using a native-language masker. The behavioral and objective results from this study provide a foundation for further investigation of how the linguistic content of target and masker and task difficulty contribute to difficulty understanding speech in noise

    A computer based analysis of the effects of rhythm modification on the intelligibility of the speech of hearing and deaf subjects

    Get PDF
    The speech of profoundly deaf persons often exhibits acquired unnatural rhythms, or a random pattern of rhythms. Inappropriate pause-time and speech-time durations are common in their speech. Specific rhythm deficiencies include abnormal rate of syllable utterance, improper grouping, poor timing and phrasing of syllables and unnatural stress for accent and emphasis. Assuming that temporal features are fundamental to the naturalness of spoken language, these abnormal timing patterns are often detractive. They may even be important factors in the decreased intelligibility of the speech. This thesis explores the significance of temporal cues in the rhythmic patterns of speech. An analysis-synthesis approach was employed based on the encoding and decoding of speech by a tandem chain of digital computer operations. Rhythm as a factor in the speech intelligibility of deaf and normal-hearing subjects was investigated. The results of this study support the general hypothesis that rhythm and rhythmic intuition are important to the perception of speech

    Communication Biophysics

    Get PDF
    Contains reports on six research projects.National Institutes of Health (Grant 5 PO1 NS13126)National Institutes of Health (Grant 5 RO1 NS18682)National Institutes of Health (Grant 5 RO1 NS20322)National Institutes of Health (Grant 5 R01 NS20269)National Institutes of Health (Grant 5 T32NS 07047)Symbion, Inc.National Science Foundation (Grant BNS 83-19874)National Science Foundation (Grant BNS 83-19887)National Institutes of Health (Grant 6 RO1 NS 12846)National Institutes of Health (Grant 1 RO1 NS 21322

    EFFECTS OF AGING ON VOICE-PITCH PROCESSING: THE ROLE OF SPECTRAL AND TEMPORAL CUES

    Get PDF
    Declines in auditory temporal processing are a common consequence of natural aging. Interactions between aging and spectro-temporal pitch processing have yet to be thoroughly investigated in humans, though recent neurophysiologic and electrophysiologic data lend support to the notion that periodicity coding using only unresolved harmonics (i.e., those available via the temporal envelope) is negatively affected as a consequence of age. Individuals with cochlear implants (CIs) must rely on the temporal envelope of speech to glean information about voice pitch [coded through the fundamental frequency (f0)], as spectral f0 cues are not available. While cochlear implants have been shown to be efficacious in older adults, it is hypothesized that they would experience difficulty perceiving spectrally-degraded voice-pitch information. The current experiments were aimed at quantifying the ability of younger and older listeners to utilize spectro-temporal cues to obtain voice pitch information when performing simple and complex auditory tasks. Experiment 1 measured the ability of younger and older NH listeners to perceive a difference in the frequency of amplitude modulated broad-band noise, thereby exploiting only temporal envelope cues to perform the task. Experiment 2 measured age-related differences in f0 difference limens as the degree of spectral degradation was manipulated to approximate CI processing. Results from Experiments 1 and 2 demonstrated that spectro-temporal processing of f0 information in non-speech stimuli is affected in older adults. Experiment 3 showed that age-related performances observed in Experiments 1 and 2 translated to voice gender identification using a natural speech stimulus. Experiment 4 attempted to estimate how younger and older NH listeners are able to utilize differences in voice pitch information in everyday listening environments (i.e., speech in noise) and how such abilities are affected by spectral degradation. Comprehensive results provide further insight on pitch coding in both normal and impaired auditory systems, and demonstrate that spectro-temporal pitch processing is dependent upon the age of the listener. Results could have important implications for elderly cochlear implant recipients

    Real-Time Contrast Enhancement to Improve Speech Recognition

    Get PDF
    An algorithm that operates in real-time to enhance the salient features of speech is described and its efficacy is evaluated. The Contrast Enhancement (CE) algorithm implements dynamic compressive gain and lateral inhibitory sidebands across channels in a modified winner-take-all circuit, which together produce a form of suppression that sharpens the dynamic spectrum. Normal-hearing listeners identified spectrally smeared consonants (VCVs) and vowels (hVds) in quiet and in noise. Consonant and vowel identification, especially in noise, were improved by the processing. The amount of improvement did not depend on the degree of spectral smearing or talker characteristics. For consonants, when results were analyzed according to phonetic feature, the most consistent improvement was for place of articulation. This is encouraging for hearing aid applications because confusions between consonants differing in place are a persistent problem for listeners with sensorineural hearing loss

    Speech Communication

    Get PDF
    Contains reports on eight research projects.C.J. LeBel FellowshipSystems Development FoundationNational Institutes of Health (Grant 5 T32 NS07040)National Institutes of Health (Grant 5 R01 NS04332)National Science Foundation (Grant 1ST 80-17599)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727
    corecore