Search CORE

4,272 research outputs found

Coding Strategies for Cochlear Implants Under Adverse Environments

Author: Tahmina Qudsia
Publication venue: UWM Digital Commons
Publication date: 01/05/2016
Field of study

Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise

University of Wisconsin-Milwaukee

Objective Assessment of Machine Learning Algorithms for Speech Enhancement in Hearing Aids

Author: Parameswaran Krishnan
Publication venue: Scholarship@Western
Publication date: 10/12/2018
Field of study

Speech enhancement in assistive hearing devices has been an area of research for many decades. Noise reduction is particularly challenging because of the wide variety of noise sources and the non-stationarity of speech and noise. Digital signal processing (DSP) algorithms deployed in modern hearing aids for noise reduction rely on certain assumptions on the statistical properties of undesired signals. This could be disadvantageous in accurate estimation of different noise types, which subsequently leads to suboptimal noise reduction. In this research, a relatively unexplored technique based on deep learning, i.e. Recurrent Neural Network (RNN), is used to perform noise reduction and dereverberation for assisting hearing-impaired listeners. For noise reduction, the performance of the deep learning model was evaluated objectively and compared with that of open Master Hearing Aid (openMHA), a conventional signal processing based framework, and a Deep Neural Network (DNN) based model. It was found that the RNN model can suppress noise and improve speech understanding better than the conventional hearing aid noise reduction algorithm and the DNN model. The same RNN model was shown to reduce reverberation components with proper training. A real-time implementation of the deep learning model is also discussed

Scholarship@Western

COMMUNICATING IN SOCIAL NETWORKS: EFFECTS OF REVERBERATION ON ACOUSTIC INFORMATION TRANSFER IN THREE SPECIES OF BIRDS

Author: Blumenrath Sandra
Publication venue
Publication date: 01/01/2011
Field of study

In socially and acoustically complex environments the auditory system processes sounds that are distorted, attenuated and additionally masked by biotic and abiotic noise. As a result, spectral and temporal alterations of the sounds may affect the transfer of information between signalers and receivers in networks of conspecifics, increasing detection thresholds and interfering with the discrimination and recognition of sound sources. To this day, much concern has been directed toward anthropogenic noise sources and whether they affect the animals' natural territorial and reproductive behavior and ultimately harm the survival of the species. Not much is known, however, about the potentially synergistic effects of environmentally-induced sound degradation, masking from noise and competing sound signals, and what implications these interactions bear for vocally-mediated exchanges in animals. This dissertation describes a series of comparative, psychophysical experiments in controlled laboratory conditions to investigate the impact of reverberation on the perception of a range of artificial sounds and natural vocalizations in the budgerigar, canary, and zebra finch. Results suggest that even small reverberation effects could be used to gauge different acoustic environments and to locate a sound source but limit the vocally-mediated transfer of important information in social settings, especially when reverberation is paired with noise. Discrimination of similar vocalizations from different individuals is significantly impaired when both reverberation and abiotic noise levels are high, whereas this ability is hardly affected by either of these factors alone. Similarly, high levels of reverberation combined with biotic noise from signaling conspecifics limit the auditory system's ability to parse a complex acoustic scene by segregating signals from multiple individuals. Important interaction effects like these caused by the characteristics of the habitat and species differences in auditory sensitivity therefore can predict whether a given acoustic environment limits communication range or interferes with the detection, discrimination, and recognition of biologically important sounds

Digital Repository at the University of Maryland

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

Author: Jensen Jesper
Kolbæk Morten
Tan Zheng-Hua
Yu Dong
Publication venue
Publication date: 11/07/2017
Field of study

In this paper we propose the utterance-level Permutation Invariant Training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep learning based solution for speaker independent multi-talker speech separation. Specifically, uPIT extends the recently proposed Permutation Invariant Training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using Recurrent Neural Networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output stream. In practice, this allows RNNs, trained with uPIT, to separate multi-talker mixed speech without any prior knowledge of signal duration, number of speakers, speaker identity or gender. We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet). Furthermore, we found that models trained with uPIT generalize well to unseen speakers and languages. Finally, we found that a single model, trained with uPIT, can handle both two-speaker, and three-speaker speech mixtures

arXiv.org e-Print Archive

VBN

Two-Channel Speech Enhancement and Implementation Considerations: Noise Reduction and Speech Quality

Author: Hammer Martin
Kjeldsen Thomas Lynge
Kjærgaard Jacob Barsøe
Publication venue: Department of Electronic Systems
Publication date: 01/01/2007
Field of study

VBN