2,151 research outputs found

    Effects of noise suppression and envelope dynamic range compression on the intelligibility of vocoded sentences for a tonal language

    Get PDF
    Vocoder simulation studies have suggested that the carrier signal type employed affects the intelligibility of vocoded speech. The present work further assessed how carrier signal type interacts with additional signal processing, namely, single-channel noise suppression and envelope dynamic range compression, in determining the intelligibility of vocoder simulations. In Experiment 1, Mandarin sentences that had been corrupted by speech spectrum-shaped noise (SSN) or two-talker babble (2TB) were processed by one of four single-channel noise-suppression algorithms before undergoing tone-vocoded (TV) or noise-vocoded (NV) processing. In Experiment 2, dynamic ranges of multiband envelope waveforms were compressed by scaling of the mean-removed envelope waveforms with a compression factor before undergoing TV or NV processing. TV Mandarin sentences yielded higher intelligibility scores with normal-hearing (NH) listeners than did noise-vocoded sentences. The intelligibility advantage of noise-suppressed vocoded speech depended on the masker type (SSN vs 2TB). NV speech was more negatively influenced by envelope dynamic range compression than was TV speech. These findings suggest that an interactional effect exists between the carrier signal type employed in the vocoding process and envelope distortion caused by signal processing

    An evaluation of intrusive instrumental intelligibility metrics

    Full text link
    Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSMcorr\text{sEPSM}^\text{corr}. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of ρ=0.92\rho=0.92 and ρ=0.89\rho=0.89, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called SIIBGauss\text{SIIB}^\text{Gauss}, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Studies in Signal Processing Techniques for Speech Enhancement: A comparative study

    Get PDF
    Speech enhancement is very essential to suppress the background noise and to increase speech intelligibility and reduce fatigue in hearing. There exist many simple speech enhancement algorithms like spectral subtraction to complex algorithms like Bayesian Magnitude estimators based on Minimum Mean Square Error (MMSE) and its variants. A continuous research is going and new algorithms are emerging to enhance speech signal recorded in the background of environment such as industries, vehicles and aircraft cockpit. In aviation industries speech enhancement plays a vital role to bring crucial information from pilot’s conversation in case of an incident or accident by suppressing engine and other cockpit instrument noises. In this work proposed is a new approach to speech enhancement making use harmonic wavelet transform and Bayesian estimators. The performance indicators, SNR and listening confirms to the fact that newly modified algorithms using harmonic wavelet transform indeed show better results than currently existing methods. Further, the Harmonic Wavelet Transform is computationally efficient and simple to implement due to its inbuilt decimation-interpolation operations compared to those of filter-bank approach to realize sub-bands

    Assessing the effect of noise-reduction to the intelligibility of low-pass filtered speech

    Get PDF
    Given the fact that most hearing-impaired listeners have low-frequency residual hearing, the present work assessed the effect of applying commonly-used singlechannel noise-reduction (NR) algorithms to improve the intelligibility of low-pass filtered speech, which simulates the effect of understanding speech with low-frequency residual hearing of hearing-impaired patients. In addition, this study was performed with Mandarin speech, which is characterized by its significant contribution of information present in (low-frequency dominated) vowels to speech intelligibility. Mandarin sentences were corrupted by steady-state speech-shaped noise and processed by four types (i.e., subspace, statistical-modeling, spectral-subtractive, and Wiener-filtering) of single-channel NR algorithms. The processed sentences were played to normal-hearing listeners for recognition. Experimental results showed that existing single-channel NR algorithms were unable to improve the intelligibility of low-pass filtered Mandarin sentences. Wiener-filtering had the least negative influence to the intelligibility of low-pass filtered speech among the four types of single-channel NR algorithms examined

    Compressive speech enhancement using semi-soft thresholding and improved threshold estimation

    Get PDF
    Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches

    Understanding low-pass-filtered Mandarin sentences: Effects of fundamental frequency contour and single-channel noise suppression

    Get PDF
    The present work assessed the effects of flattening the fundamental frequency (F0) contour and processing by single-channel noise suppression on the intelligibility of low-pass (LP)-filtered (LPF) sentences. The original F0 contour was replaced by an average flat F0 contour or treated by single-channel noise suppression, followed by application of LP filtering to Mandarin sentences. Processed stimuli were presented to normal-hearing listeners to recognize. Flattening the F0 contour significantly affected the understanding of LPF sentences. Noise suppression by existing single-channel algorithms did not improve the intelligibility of LPF sentences

    Evaluation of the sparse coding shrinkage noise reduction algorithm for the hearing impaired

    No full text
    Although there are numerous single-channel noise reduction strategies to improve speech perception in a noisy environment, most of them can only improve speech quality but not improve speech intelligibility for normal hearing (NH) or hearing impaired (HI) listeners. Exceptions that can improve speech intelligibility currently are only those that require a priori statistics of speech or noise. Most of the noise reduction algorithms in hearing aids are adopted directly from the algorithms for NH listeners without taking into account of the hearing loss factors within HI listeners. HI listeners suffer more in speech intelligibility than NH listeners in the same noisy environment. Further study of monaural noise reduction algorithms for HI listeners is required.The motivation is to adapt a model-based approach in contrast to the conventional Wiener filtering approach. The model-based algorithm called sparse coding shrinkage (SCS) was proposed to extract key speech information from noisy speech. The SCS algorithm was evaluated by comparison with another state-of-the-art Wiener filtering approach through speech intelligibility and quality tests using 9 NH and 9 HI listeners. The SCS algorithm matched the performance of the Wiener filtering algorithm in speech intelligibility and speech quality. Both algorithms showed some intelligibility improvements for HI listeners but not at all for NH listeners. The algorithms improved speech quality for both HI and NH listeners.Additionally, a physiologically-inspired hearing loss simulation (HLS) model was developed to characterize hearing loss factors and simulate hearing loss consequences. A methodology was proposed to evaluate signal processing strategies for HI listeners with the proposed HLS model and NH subjects. The corresponding experiment was performed by asking NH subjects to listen to unprocessed/enhanced speech with the HLS model. Some of the effects of the algorithms seen in HI listeners are reproduced, at least qualitatively, by using the HLS model with NH listeners.Conclusions: The model-based algorithm SCS is promising for improving performance in stationary noise although no clear difference was seen in the performance of SCS and a competitive Wiener filtering algorithm. Fluctuating noise is more difficult to reduce compared to stationary noise. Noise reduction algorithms may perform better at higher input signal-to-noise ratios (SNRs) where HI listeners can get benefit but where NH listeners already reach ceiling performance. The proposed HLS model can save time and cost when evaluating noise reduction algorithms for HI listeners

    Development and use of a new Speech Quality Evaluation Parameter ESNR using ANN and Grey Wolf Optimizer

    Get PDF
    197-200The performance of Speech Enhancement (SE) Algorithms is evaluated using various objective and subjective evaluation parameters. Recently, few objective evaluation parameters are developed for the measurement of speech quality and intelligibility. But still, there are ample scopes determining statistical parameters to predict the SNR of a noisy speech signal without using any reference of clean signal and noise. In this paper, this problem has been addressed and three types of Artificial Neural Networks (ANN) are developed for efficient prediction of the estimated SNR (E-SNR) of a given noisy speech signal. To further improve the accuracy of prediction of the SNR of the ANN, the coefficients of ANN are tuned using the bio-inspired optimization technique. In this paper, a popular and efficient Grey wolf Optimization is chosen for the purpose. Several audio features are studied and appropriate features are chosen as the inputs to the ANN. Finally, a comparative performance analysis is carried out using two standard speech databases and the best performing ANN and audio features are identified to provide the best ESNR
    corecore