Search CORE

1,472 research outputs found

DISSOCIABLE MECHANISMS OF CONCURRENT SPEECH IDENTIFICATION IN NOISE AT CORTICAL AND SUBCORTICAL LEVELS.

Author: Yellamsetty Anusha
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2018
Field of study

When two vowels with different fundamental frequencies (F0s) are presented concurrently, listeners often hear two voices producing different vowels on different pitches. Parsing of this simultaneous speech can also be affected by the signal-to-noise ratio (SNR) in the auditory scene. The extraction and interaction of F0 and SNR cues may occur at multiple levels of the auditory system. The major aims of this dissertation are to elucidate the neural mechanisms and time course of concurrent speech perception in clean and in degraded listening conditions and its behavioral correlates. In two complementary experiments, electrical brain activity (EEG) was recorded at cortical (EEG Study #1) and subcortical (FFR Study #2) levels while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero and four semitones (STs) presented in either clean or noise degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in identifying both vowels for larger F0 separations (i.e., 4ST; with pitch cues), and this F0-benefit was more pronounced at more favorable SNRs. Time-frequency analysis of cortical EEG oscillations (i.e., brain rhythms) revealed a dynamic time course for concurrent speech processing that depended on both extrinsic (SNR) and intrinsic (pitch) acoustic factors. Early high frequency activity reflected pre-perceptual encoding of acoustic features (~200 ms) and the quality (i.e., SNR) of the speech signal (~250-350ms), whereas later-evolving low-frequency rhythms (~400-500ms) reflected post-perceptual, cognitive operations that covaried with listening effort and task demands. Analysis of subcortical responses indicated that while FFRs provided a high-fidelity representation of double vowel stimuli and the spectro-temporal nonlinear properties of the peripheral auditory system. FFR activity largely reflected the neural encoding of stimulus features (exogenous coding) rather than perceptual outcomes, but timbre (F1) could predict the speed in noise conditions. Taken together, results of this dissertation suggest that subcortical auditory processing reflects mostly exogenous (acoustic) feature encoding in stark contrast to cortical activity, which reflects perceptual and cognitive aspects of concurrent speech perception. By studying multiple brain indices underlying an identical task, these studies provide a more comprehensive window into the hierarchy of brain mechanisms and time-course of concurrent speech processing

University of Memphis Digital Commons

Adaptation by normal listeners to upward spectral shifts of speech: Implications for cochlear implants

Author: Faulkner A
Rosen S
Wilkinson L
Publication venue: AMER INST PHYSICS
Publication date: 01/12/1999
Field of study

Multi-channel cochlear implants typically present spectral information to the wrong ''place'' in the auditory nerve array, because electrodes can only be inserted partway into the cochlea. Although such spectral shifts are known to cause large immediate decrements in performance in simulations, the extent to which listeners can adapt to such shifts has yet to be investigated. Here, the effects of a four-channel implant in normal listeners have been simulated, and performance tested with unshifted spectral information and with the equivalent of a 6.5-mm basalward shift on the basilar membrane (1.3-2.9 octaves, depending on frequency). As expected, the unshifted simulation led to relatively high levels of mean performance (e;g., 64% of words in sentences correctly identified) whereas the shifted simulation led to very poor results (e.g., 1% of words). However, after just nine 20-min sessions of connected discourse tracking with the shifted simulation, performance improved significantly for the identification of intervocalic consonants, medial vowels in monosyllables, and words in sentences (30% of words). Also, listeners were able to track connected discourse of shifted signals without lipreading at rates up to 40 words per minute. Although we do not know if complete adaptation to the shifted signals is possible, it is clear that short-term experiments seriously exaggerate the long-term consequences of such spectral shifts. (C) 1999 Acoustical Society of America. [S0001-4966(99)02012-3]

UCL Discovery

Automatic Quality Control and Enhancement for Voice-Based Remote Parkinson's Disease Detection

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Kavalekalam Mathew Shaji
Little Max A.
Poorjam Amir Hossein
Raykov Jordan P.
Shi Liming
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

VBN

Auditory-Visual Integration of Sine-Wave Speech

Author: Tamosiunas Matthew
Publication venue: 'The Ohio State University Libraries'
Publication date: 01/06/2007
Field of study

It has long been known that observers use visual information from a talker’s face to supplement auditory input to understand speech in situations where the auditory signal is compromised in some way, such as in a noisy environment. However, researchers have demonstrated that even when the auditory signal is perfect, a paired visual stimulus will give rise to a different percept from that without the visual stimulus. This was demonstrated by McGurk and McDonald (1976) when they discovered that when a person is presented with an auditory CV combination (e.g., /ba/), and visual speech stimulus (e.g., /ga/), the resulting perception is often a fusion (e.g., /da/) of the two. This phenomenon can be observed in both degraded and non-degraded speech stimuli, suggesting that the integration is not a function of having a poor auditory stimulus. However, other studies have shown that the normal acoustic speech stimulus is highly redundant in the sense that the signal contains more information than necessary for sound identification. This redundancy may play an important role in auditory-visual integration. Shannon et al. (1995) reduced the spectral information in speech to one, two, three, and four bands of modulated noise using the original speech envelope to modulate the same spectral band. The results showed very high intelligibility even for reductions to three or four bands, suggesting that there are tremendous amounts of redundancy in the normal speech signal. Furthermore, Remez et al. (1981) reduced the speech signal to three time-varying sinusoids that matched the center frequencies and amplitudes at the first three formants of the natural speech signal. Again, the results showed high intelligibility (when the subjects were told that the sounds were, in fact, reduced human speech). A remaining question is whether reducing the redundancy in the auditory signal changes the auditory-visual integration process in either quantitative or qualitative ways. The present study addressed this issue by using, like Remez, sine wave reductions of the auditory stimuli, with the addition of visual stimuli. A total of 10 normal-hearing adult listeners were asked to identify speech syllables produced by five talkers, in which the auditory portions of the signals were degraded using sine wave reduction. Participants were tested with four different sinewave reductions: F0, F1, F2, and F0+F1+F2. Stimuli were presented under auditory only, visual only, and auditory plus visual conditions. Preliminary analysis of the results showed very low levels of performance under auditory only presentation conditions for all of the sinewave reductions, even F0+F1+F2. Visual-only performance was approximately 30%, consistent with previous studies. Little evidence of improvement in the auditory plus visual condition was observed, suggesting that this level of reduction in the auditory stimulus removes so much auditory information that listeners are unable to use the stimulus to achieve any meaningful audiovisual speech integration. These results have implications for the design of processors for assistive devices such as cochlear implants.This thesis was supported by an ASC Undergraduate Scholarship and an SBS Undergraduate Research Scholarshi

KnowledgeBank at OSU

The use of acoustic cues in phonetic perception: Effects of spectral degradation, limited bandwidth and background noise

Author: Winn Matthew Brandon
Publication venue
Publication date: 01/01/2011
Field of study

Hearing impairment, cochlear implantation, background noise and other auditory degradations result in the loss or distortion of sound information thought to be critical to speech perception. In many cases, listeners can still identify speech sounds despite degradations, but understanding of how this is accomplished is incomplete. Experiments presented here tested the hypothesis that listeners would utilize acoustic-phonetic cues differently if one or more cues were degraded by hearing impairment or simulated hearing impairment. Results supported this hypothesis for various listening conditions that are directly relevant for clinical populations. Analysis included mixed-effects logistic modeling of contributions of individual acoustic cues for various contrasts. Listeners with cochlear implants (CIs) or normal-hearing (NH) listeners in CI simulations showed increased use of acoustic cues in the temporal domain and decreased use of cues in the spectral domain for the tense/lax vowel contrast and the word-final fricative voicing contrast. For the word-initial stop voicing contrast, NH listeners made less use of voice-onset time and greater use of voice pitch in conditions that simulated high-frequency hearing impairment and/or masking noise; influence of these cues was further modulated by consonant place of articulation. A pair of experiments measured phonetic context effects for the "s/sh" contrast, replicating previously observed effects for NH listeners and generalizing them to CI listeners as well, despite known deficiencies in spectral resolution for CI listeners. For NH listeners in CI simulations, these context effects were absent or negligible. Audio-visual delivery of this experiment revealed enhanced influence of visual lip-rounding cues for CI listeners and NH listeners in CI simulations. Additionally, CI listeners demonstrated that visual cues to gender influence phonetic perception in a manner consistent with gender-related voice acoustics. All of these results suggest that listeners are able to accommodate challenging listening situations by capitalizing on the natural (multimodal) covariance in speech signals. Additionally, these results imply that there are potential differences in speech perception by NH listeners and listeners with hearing impairment that would be overlooked by traditional word recognition or consonant confusion matrix analysis

Digital Repository at the University of Maryland

ProQuest OAI Repository

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

Author: Beigi Homayoon
Hamid Reza Sharifzadeh
Ian V. Mcloughlin
Jingjie Li
Joliveau Elodie
McLoughlin Ian Vince
Netsell Ronald
Rothenberg Martin
Sharifzadeh Hamid Reza
Sharifzadeh Hamid Reza
Su Lim Tan
Sundberg Johan
Toda Tomoki
Yan Song
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/05/2015
Field of study

Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

Crossref

Kent Academic Repository

Effects of Hearing Aid Amplification on Robust Neural Coding of Speech

Author: Boley Jonathan Daniel
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2013
Field of study

Hearing aids are able to restore some hearing abilities for people with auditory impairments, but background noise remains a significant problem. Unfortunately, we know very little about how speech is encoded in the auditory system, particularly in impaired systems with prosthetic amplifiers. There is growing evidence that relative timing in the neural signals (known as spatiotemporal coding) is important for speech perception, but there is little research that relates spatiotemporal coding and hearing aid amplification. This research uses a combination of computational modeling and physiological experiments to characterize how hearing aids affect vowel coding in noise at the level of the auditory nerve. The results indicate that sensorineural hearing impairment degrades the temporal cues transmitted from the ear to the brain. Two hearing aid strategies (linear gain and wide dynamic-range compression) were used to amplify the acoustic signal. Although appropriate gain was shown to improve temporal coding for individual auditory nerve fibers, neither strategy improved spatiotemporal cues. Previous work has attempted to correct the relative timing by adding frequency-dependent delays to the acoustic signal (e.g., within a hearing aid). We show that, although this strategy can affect the timing of auditory nerve responses, it is unlikely to improve the relative timing as intended. We have shown that existing hearing aid technologies do not improve some of the neural cues that we think are important for perception, but it is important to understand these limitations. Our hope is that this knowledge can be used to develop new technologies to improve auditory perception in difficult acoustic environments

Purdue E-Pubs

Towards a silent speech interface for Portuguese: Surface electromyography and the nasality challenge

Author: Dias M. S.
Freitas J.
Teixeira A.
Publication venue: 'Scitepress'
Publication date: 01/01/2012
Field of study

A Silent Speech Interface (SSI) aims at performing Automatic Speech Recognition (ASR) in the absence of an intelligible acoustic signal. It can be used as a human-computer interaction modality in high-background-noise environments, such as living rooms, or in aiding speech-impaired individuals, increasing in prevalence with ageing. If this interaction modality is made available for users own native language, with adequate performance, and since it does not rely on acoustic information, it will be less susceptible to problems related to environmental noise, privacy, information disclosure and exclusion of speech impaired persons. To contribute to the existence of this promising modality for Portuguese, for which no SSI implementation is known, we are exploring and evaluating the potential of state-of-the-art approaches. One of the major challenges we face in SSI for European Portuguese is recognition of nasality, a core characteristic of this language Phonetics and Phonology. In this paper a silent speech recognition experiment based on Surface Electromyography is presented. Results confirmed recognition problems between minimal pairs of words that only differ on nasality of one of the phones, causing 50% of the total error and evidencing accuracy performance degradation, which correlates well with the exiting knowledge.info:eu-repo/semantics/acceptedVersio

Repositório Institucional do ISCTE-IUL

Speech Modeling and Robust Estimation for Diagnosis of Parkinson’s Disease

Author: Shi Liming
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2019
Field of study

VBN