320 research outputs found

    Effect of Prolonged Non-Traumatic Noise Exposure on Unvoiced Speech Recognition

    Get PDF
    Animal models in the past decade have shown that noise exposure may affect temporal envelope processing at supra-threshold levels while the absolute hearing threshold remains in the normal range. However, human studies have failed to consistently find such issue due to poor control of the participants’ noise exposure history and the measure sensitivity. The current study operationally defined non-traumatic noise exposure (NTNE) to be noise exposure at dental schools because of its distinctive high-pass spectral feature, non-traumatic nature, and systematic exposure schedule across dental students of different years. Temporal envelope processing was examined through unvoiced speech recognition interrupted by noise or by silence. The results showed that people who had systematic exposure to dental noise performed more poorly on tasks of temporal envelope processing than the exposed people. The effect of high-frequency NTNE on temporal envelope processing was more robust inside than outside the spectral band of dental noise and was more obvious in conditions that required finer temporal resolution (e.g faster noise modulation rate) than in those requiring less fine temporal resolution (e.g. slower noise modulation rate). Furthermore, there was a significant performance difference between the exposed and the unexposed groups on tasks of spectral envelope processing at low frequency. Meanwhile, the two groups performed similarly in tasks near threshold. Additional analyses showed that factors such as age, years of musical training, non-dental noise exposure history and peripheral auditory function were not able to explain the variance of the performance in tasks of temporal or spectral envelope processing. The findings from the current study support the general assumptions from animal models of NTNE that temporal and spectral envelope processing issues related to NTNE likely occur in retro-cochlear sites, at supra-threshold levels, and could be easily overlooked by clinically routine audiologic screening

    Relationship between speech-evoked neural responses and perception of speech in noise in older adults

    Get PDF
    Speech-in-noise (SPIN) perception involves neural encoding of temporal acoustic cues. Cues include temporal fine structure (TFS) and envelopes that modulate at syllable (Slow-rate ENV) and fundamental frequency (F0-rate ENV) rates. Here the relationship between speech-evoked neural responses to these cues and SPIN perception was investigated in older adults. Theta-band phase-locking values (PLV) that reflect cortical sensitivity to Slow-rate ENV and peripheral/brainstem frequency-following responses phase-locked to F0-rate ENV (FFRENV_F0) and TFS (FFRTFS) were measured from scalp-EEG responses to a repeated speech syllable in steady-state speech-shaped (SpN) and 16-speaker babble (BbN) noises. The results showed that: 1) SPIN performance and PLV were significantly higher under SpN than BbN, implying differential cortical encoding may serve as the neural mechanism of SPIN performance that varies as a function of noise types; 2) PLV and FFRTFS at resolved harmonics were significantly related to good SPIN performance, supporting the importance of phase-locked neural encoding of Slow-rate ENV and TFS of resolved harmonics during SPIN perception; 3) FFRENV_F0 was not associated to SPIN performance until audiometric threshold was controlled for, indicating that hearing loss should be carefully controlled when studying the role of neural encoding of F0-rate ENV. Implications are drawn with respect to fitting auditory prostheses

    Speech Enhancement Exploiting the Source-Filter Model

    Get PDF
    Imagining everyday life without mobile telephony is nowadays hardly possible. Calls are being made in every thinkable situation and environment. Hence, the microphone will not only pick up the user’s speech but also sound from the surroundings which is likely to impede the understanding of the conversational partner. Modern speech enhancement systems are able to mitigate such effects and most users are not even aware of their existence. In this thesis the development of a modern single-channel speech enhancement approach is presented, which uses the divide and conquer principle to combat environmental noise in microphone signals. Though initially motivated by mobile telephony applications, this approach can be applied whenever speech is to be retrieved from a corrupted signal. The approach uses the so-called source-filter model to divide the problem into two subproblems which are then subsequently conquered by enhancing the source (the excitation signal) and the filter (the spectral envelope) separately. Both enhanced signals are then used to denoise the corrupted signal. The estimation of spectral envelopes has quite some history and some approaches already exist for speech enhancement. However, they typically neglect the excitation signal which leads to the inability of enhancing the fine structure properly. Both individual enhancement approaches exploit benefits of the cepstral domain which offers, e.g., advantageous mathematical properties and straightforward synthesis of excitation-like signals. We investigate traditional model-based schemes like Gaussian mixture models (GMMs), classical signal processing-based, as well as modern deep neural network (DNN)-based approaches in this thesis. The enhanced signals are not used directly to enhance the corrupted signal (e.g., to synthesize a clean speech signal) but as so-called a priori signal-to-noise ratio (SNR) estimate in a traditional statistical speech enhancement system. Such a traditional system consists of a noise power estimator, an a priori SNR estimator, and a spectral weighting rule that is usually driven by the results of the aforementioned estimators and subsequently employed to retrieve the clean speech estimate from the noisy observation. As a result the new approach obtains significantly higher noise attenuation compared to current state-of-the-art systems while maintaining a quite comparable speech component quality and speech intelligibility. In consequence, the overall quality of the enhanced speech signal turns out to be superior as compared to state-of-the-art speech ehnahcement approaches.Mobiltelefonie ist aus dem heutigen Leben nicht mehr wegzudenken. Telefonate werden in beliebigen Situationen an beliebigen Orten geführt und dabei nimmt das Mikrofon nicht nur die Sprache des Nutzers auf, sondern auch die Umgebungsgeräusche, welche das Verständnis des Gesprächspartners stark beeinflussen können. Moderne Systeme können durch Sprachverbesserungsalgorithmen solchen Effekten entgegenwirken, dabei ist vielen Nutzern nicht einmal bewusst, dass diese Algorithmen existieren. In dieser Arbeit wird die Entwicklung eines einkanaligen Sprachverbesserungssystems vorgestellt. Der Ansatz setzt auf das Teile-und-herrsche-Verfahren, um störende Umgebungsgeräusche aus Mikrofonsignalen herauszufiltern. Dieses Verfahren kann für sämtliche Fälle angewendet werden, in denen Sprache aus verrauschten Signalen extrahiert werden soll. Der Ansatz nutzt das Quelle-Filter-Modell, um das ursprüngliche Problem in zwei Unterprobleme aufzuteilen, die anschließend gelöst werden, indem die Quelle (das Anregungssignal) und das Filter (die spektrale Einhüllende) separat verbessert werden. Die verbesserten Signale werden gemeinsam genutzt, um das gestörte Mikrofonsignal zu entrauschen. Die Schätzung von spektralen Einhüllenden wurde bereits in der Vergangenheit erforscht und zum Teil auch für die Sprachverbesserung angewandt. Typischerweise wird dabei jedoch das Anregungssignal vernachlässigt, so dass die spektrale Feinstruktur des Mikrofonsignals nicht verbessert werden kann. Beide Ansätze nutzen jeweils die Eigenschaften der cepstralen Domäne, die unter anderem vorteilhafte mathematische Eigenschaften mit sich bringen, sowie die Möglichkeit, Prototypen eines Anregungssignals zu erzeugen. Wir untersuchen modellbasierte Ansätze, wie z.B. Gaußsche Mischmodelle, klassische signalverarbeitungsbasierte Lösungen und auch moderne tiefe neuronale Netzwerke in dieser Arbeit. Die so verbesserten Signale werden nicht direkt zur Sprachsignalverbesserung genutzt (z.B. Sprachsynthese), sondern als sogenannter A-priori-Signal-zu-Rauschleistungs-Schätzwert in einem traditionellen statistischen Sprachverbesserungssystem. Dieses besteht aus einem Störleistungs-Schätzer, einem A-priori-Signal-zu-Rauschleistungs-Schätzer und einer spektralen Gewichtungsregel, die üblicherweise mit Hilfe der Ergebnisse der beiden Schätzer berechnet wird. Schließlich wird eine Schätzung des sauberen Sprachsignals aus der Mikrofonaufnahme gewonnen. Der neue Ansatz bietet eine signifikant höhere Dämpfung des Störgeräuschs als der bisherige Stand der Technik. Dabei wird eine vergleichbare Qualität der Sprachkomponente und der Sprachverständlichkeit gewährleistet. Somit konnte die Gesamtqualität des verbesserten Sprachsignals gegenüber dem Stand der Technik erhöht werden

    Digital Signal Processing

    Get PDF
    Contains research objectives and summary of research on seven research projects.U. S. Navy Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ENG71-02319-A02

    The Natural Statistics of Audiovisual Speech

    Get PDF
    Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2–7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver

    Cortical and subcortical speech-evoked responses in young and older adults: Effects of background noise, arousal states, and neural excitability

    Get PDF
    This thesis investigated how the brain processes speech signals in human adults across a wide age-range in the sensory auditory systems using electroencephalography (EEG). Two types of speech-evoked phase-locked responses were focused on: (i) cortical responses (theta-band phase-locked responses) that reflect processing of low-frequency slowly-varying envelopes of speech; (ii) subcortical/peripheral responses (frequency-following responses; FFRs) that reflect encoding of speech periodicity and temporal fine structure information. The aims are to elucidate how these neural activities are affected by different internal (aging, hearing loss, level of arousal and neural excitability) and external (background noise) factors during our daily life through three studies. Study 1 investigated theta-band phase-locking and FFRs in noisy environments in young and older adults. It investigated how aging and hearing loss affect these activities under quiet and noisy environments, and how these activities are associated with speech-in-noise perception. The results showed that ageing and hearing loss affect speech-evoked phase-locked responses through different mechanisms, and the effects of aging on cortical and subcortical activities take different roles in speech-in-noise perception. Study 2 investigated how level of arousal, or consciousness, affects phase-locked responses in young and older adults. The results showed that both theta-band phase-locking and FFRs decreases following decreases in the level of arousal. It was further found that neuro-regulatory role of sleep spindles on theta-band phase-locking is distinct between young and older adults, indicating that the mechanisms of neuro-regulation for phase-locked responses in different arousal states are age-dependent. Study 3 established a causal relationship between the auditory cortical excitability and FFRs using combined transcranial direct current stimulation (tDCS) and EEG. FFRs were measured before and after tDCS was applied over the auditory cortices. The results showed that changes in neural excitability of the right auditory cortex can alter FFR magnitudes along the contralateral pathway. This shows important theoretical and clinical implications that causally link functions of auditory cortex with neural encoding of speech periodicity. Taken together, findings of this thesis will advance our understanding of how speech signals are processed via neural phase-locking in our everyday life across the lifespan

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    Auditory sustained field responses to periodic noise

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Auditory sustained responses have been recently suggested to reflect neural processing of speech sounds in the auditory cortex. As periodic fluctuations below the pitch range are important for speech perception, it is necessary to investigate how low frequency periodic sounds are processed in the human auditory cortex. Auditory sustained responses have been shown to be sensitive to temporal regularity but the relationship between the amplitudes of auditory evoked sustained responses and the repetitive rates of auditory inputs remains elusive. As the temporal and spectral features of sounds enhance different components of sustained responses, previous studies with click trains and vowel stimuli presented diverging results. In order to investigate the effect of repetition rate on cortical responses, we analyzed the auditory sustained fields evoked by periodic and aperiodic noises using magnetoencephalography.</p> <p>Results</p> <p>Sustained fields were elicited by white noise and repeating frozen noise stimuli with repetition rates of 5-, 10-, 50-, 200- and 500 Hz. The sustained field amplitudes were significantly larger for all the periodic stimuli than for white noise. Although the sustained field amplitudes showed a rising and falling pattern within the repetition rate range, the response amplitudes to 5 Hz repetition rate were significantly larger than to 500 Hz.</p> <p>Conclusions</p> <p>The enhanced sustained field responses to periodic noises show that cortical sensitivity to periodic sounds is maintained for a wide range of repetition rates. Persistence of periodicity sensitivity below the pitch range suggests that in addition to processing the fundamental frequency of voice, sustained field generators can also resolve low frequency temporal modulations in speech envelope.</p
    corecore