154 research outputs found

    Informed algorithms for sound source separation in enclosed reverberant environments

    Get PDF
    While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

    Effective Binaural Multi-Channel Processing Algorithm for Improved Environmental Presence

    Get PDF
    Binaural noise-reduction algorithms based on multi-channel Wiener filter (MWF) are promising techniques to be used in binaural assistive listening devices. The real-time implementation of the existing binaural MWF methods, however, involves challenges to increase the amount of noise reduction without imposing speech distortion, and at the same time preserving the binaural cues of both speech and noise components. Although significant efforts have been made in the literature, most developed methods so far have focused only on either the former or latter problem. This paper proposes an alternative binaural MWF algorithm that incorporates the non-stationarity of the signal components into the framework. The main objective is to design an algorithm that would be able to select the sources that are present in the environment. To achieve this, a modified speech presence probability (SPP) and a single-channel speech enhancement algorithm are utilized in the formulation. The resulting optimal filter also avoids the poor estimation of the second-order clean speech statistics, which is normally done by simple subtraction. Theoretical analysis and performance evaluation using realistic recorded data shows the advantage of the proposed method over the reference MWF solution in terms of the binaural cues preservation, as well as the noise reduction and speech distortion

    User-Symbiotic Speech Enhancement for Hearing Aids

    Get PDF

    Speech enhancement in binaural hearing protection devices

    Get PDF
    The capability of people to operate safely and effective under extreme noise conditions is dependent on their accesses to adequate voice communication while using hearing protection. This thesis develops speech enhancement algorithms that can be implemented in binaural hearing protection devices to improve communication and situation awareness in the workplace. The developed algorithms which emphasize low computational complexity, come with the capability to suppress noise while enhancing speech

    Data-driven Speech Intelligibility Enhancement and Prediction for Hearing Aids

    Get PDF
    Hearing impairment is a widespread problem around the world. It is estimated that one in six people are living with some degree of hearing loss. Moderate and severe hearing impairment has been recognised as one of the major causes of disability, which is associated with declines in the quality of life, mental illness and dementia. However, investigation shows that only 10-20\% of older people with significant hearing impairment wear hearing aids. One of the main factors causing the low uptake is that current devices struggle to help hearing aid users understand speech in noisy environments. For the purpose of compensating for the elevated hearing thresholds and dysfunction of source separation processing caused by the impaired auditory system, amplification and denoising have been the major focuses of current hearing aid studies to improve the intelligibility of speech in noise. Also, it is important to derive a metric that can fairly predict speech intelligibility for the better development of hearing aid techniques. This thesis aims to enhance the speech intelligibility of hearing impaired listeners. Motivated by the success of data-driven approaches in many speech processing applications, this work proposes the differentiable hearing aid speech processing (DHASP) framework to optimise both the amplification and denoising modules within a hearing aid processor. This is accomplished by setting an intelligibility-based optimisation objective and taking advantage of large-scale speech databases to train the hearing aid processor to maximise the intelligibility for the listeners. The first set of experiments is conducted on both clean and noisy speech databases, and the results from objective evaluation suggest that the amplification fittings optimised within the DHASP framework can outperform a widely used and well-recognised fitting. The second set of experiments is conducted on a large-scale database with simulated domestic noisy scenes. The results from both objective and subjective evaluations show that the DHASP-optimised hearing aid processor incorporating a deep neural network-based denoising module can achieve competitive performance in terms of intelligibility enhancement. A precise intelligibility predictor can provide reliable evaluation results to save the cost of expensive and time-consuming subjective evaluation. Inspired by the findings that automatic speech recognition (ASR) models show similar recognition results as humans in some experiments, this work exploits ASR models for intelligibility prediction. An intrusive approach using ASR hidden representations and a non-intrusive approach using ASR uncertainty are proposed and explained in the third and fourth experimental chapters. Experiments are conducted on two databases, one with monaural speech in speech-spectrum-shaped noise with normal hearing listeners, and the other one with processed binaural speech in domestic noise with hearing impaired listeners. Results suggest that both the intrusive and non-intrusive approaches can achieve top performances and outperform a number of widely used intelligibility prediction approaches. In conclusion, this thesis covers both the enhancement and prediction of speech intelligibility for hearing aids. The proposed hearing aid processor optimised within the proposed DHASP framework can significantly improve the intelligibility of speech in noise for hearing impaired listeners. Also, it is shown that the proposed ASR-based intelligibility prediction approaches can achieve state-of-the-art performances against a number of widely used intelligibility predictors

    Localization of sound sources : a systematic review

    Get PDF
    Sound localization is a vast field of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this field to gain benefits. Various types of microphone arrays serve the purpose of sensing the incoming sound. This paper presents an overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones. In order to overcome these limitations certain approaches are also presented. Detailed explanation of some of the existing methods that are used for sound localization using microphone arrays in the recent literature is given. Existing methods are studied in a comparative fashion along with the factors that influence the choice of one method over the others. This review is done in order to form a basis for choosing the best fit method for our use

    Multi-channel dereverberation for speech intelligibility improvement in hearing aid applications

    Get PDF

    The Effect of a Voice Activity Detector on the Speech Enhancement

    Get PDF
    A multimicrophone speech enhancement algorithm for binaural hearing aids that preserves interaural time delays was proposed recently. The algorithm is based on multichannel Wiener filtering and relies on a voice activity detector (VAD) for estimation of second-order statistics. Here, the effect of a VAD on the speech enhancement of this algorithm was evaluated using an envelope-based VAD, and the performance was compared to that achieved using an ideal error-free VAD. The performance was considered for stationary directional noise and nonstationary diffuse noise interferers at input SNRs from 10 to +5dB. Intelligibility-weighted SNR improvements of about 20dB and 6dB were found for the directional and diffuse noise, respectively. No large degradations (<1dB) due to the use of envelope-based VAD were found down to an input SNR of 0dB for the directional noise and 5dB for the diffuse noise. At lower input SNRs, the improvement decreased gradually to 15dB for the directional noise and 3dB for the diffuse noise.12 page(s
    • …
    corecore