Search CORE

266 research outputs found

An evaluation of intrusive instrumental intelligibility metrics

Author: Hendriks Richard C.
Kleijn W. Bastiaan
Van Kuyk Steven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and

\text{sEPSM}^\text{corr}

. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of

\rho=0.92

and

\rho=0.89

, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on data sets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, the paper presents a new version of SIIB called

\text{SIIB}^\text{Gauss}

, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

arXiv.org e-Print Archive

Speech Intelligibility Prediction Based on Mutual Information

Author: Jensen Jesper
Taal Cees H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

Crossref

VBN

A Weighted STOI Intelligibility Metric Based On Mutual Information

Author: Brookes D
Lightburn L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/12/2015
Field of study

It is known that the information required for the intelligibility of a speech signal is distributed non-uniformly in time. In this paper we propose WSTOI, a modified version of STOI, a speech intelligibility metric. With WSTOI the contribution of each time-frequency cell is weighted by an estimate of its intelligibility content. This estimate is equal to the mutual information between two hypothetical signals at either end of a simplified model of human communication. Listening tests show that the modification improves the prediction accuracy of STOI at all performance levels on both long and short utterances. An improvement was observed across all tested noise types and suppression algorithms

Spiral - Imperial College Digital Repository

Mask-based enhancement of very noisy speech

Author: Lightburn Leo Charles
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/03/2020
Field of study

When speech is contaminated by high levels of additive noise, both its perceptual quality and its intelligibility are reduced. Studies show that conventional approaches to speech enhancement are able to improve quality but not intelligibility. However, in recent years, algorithms that estimate a time-frequency mask from noisy speech using a supervised machine learning approach and then apply this mask to the noisy speech have been shown to be capable of improving intelligibility. The most direct way of measuring intelligibility is to carry out listening tests with human test subjects. However, in situations where listening tests are impractical and where some additional uncertainty in the results is permissible, for example during the development phase of a speech enhancer, intrusive intelligibility metrics can provide an alternative to listening tests. This thesis begins by outlining a new intrusive intelligibility metric, WSTOI, that is a development of the existing STOI metric. WSTOI improves STOI by weighting the intelligibility contributions of different time-frequency regions with an estimate of their intelligibility content. The prediction accuracies of WSTOI and STOI are compared for a range of noises and noise suppression algorithms and it is found that WSTOI outperforms STOI in all tested conditions. The thesis then investigates the best choice of mask-estimation algorithm, target mask, and method of applying the estimated mask. A new target mask, the HSWOBM, is proposed that optimises a modified version of WSTOI with a higher frequency resolution. The HSWOBM is optimised for a stochastic noise signal to encourage a mask estimator trained on the HSWOBM to generalise better to unseen noise conditions. A high frequency resolution version of WSTOI is optimised as this gives improvements in predicted quality compared with optimising WSTOI. Of the tested approaches to target mask estimation, the best-performing approach uses a feed-forward neural network with a loss function based on WSTOI. The best-performing feature set is based on the gains produced by a classical speech enhancer and an estimate of the local voiced-speech-plus-noise to noise ratio in different time-frequency regions, which is obtained with the aid of a pitch estimator. When the estimated target mask is applied in the conventional way, by multiplying the speech by the mask in the time-frequency domain, it can result in speech with very poor perceptual quality. The final chapter of this thesis therefore investigates alternative approaches to applying the estimated mask to the noisy speech, in order to improve both intelligibility and quality. An approach is developed that uses the mask to supply prior information about the speech presence probability to a classical speech enhancer that minimises the expected squared error in the log spectral amplitudes. The proposed end-to-end enhancer outperforms existing algorithms in terms of predicted quality and intelligibility for most noise types.Open Acces

Spiral - Imperial College Digital Repository

Speech Intelligibility Prediction for Hearing Aid Systems

Author: Heidemann Andersen Asger
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2017
Field of study

VBN

Binaural Speech Enhancement Using STOI-Optimal Masks

Author: Brookes Mike
Naylor Patrick A.
Tokala Vikas
Publication venue
Publication date: 30/09/2022
Field of study

STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a `better-ear listening' mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in each time-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the proposed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.Comment: Accepted at IWAENC 202

arXiv.org e-Print Archive

Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech

Author: de Haan Jan Mark
Heidemann Andersen Asger
Jensen Jesper
Tan Zheng-Hua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/07/2016
Field of study

Crossref

VBN

Japanese speech intelligibility estimation and prediction using objective intelligibility indices under noisy and reverberant conditions

Author: KOBAYASHI Yosuke
KONDO Kazuhiro
Publication venue: 'Elsevier BV'
Publication date: 15/12/2019
Field of study

Objective measures of intelligibility are preferable to subjective ones in the evaluation of speech systems used in real environments. In this study, subjective evaluations of eight types of indoor noise environments were used to compare four intelligibility indices to objectively evaluate Japanese speech intelligibility. These indices were as follows: short-time objective intelligibility (STOI), which has been widely used in recent years; speech intelligibility prediction based on mutual information (SIMI), which is derived from STOI; extended STOI (ESTOI), which is an improved version of STOI; and frequency weighted segmental signal to noise ratio (fwSNRseg), which incorporates both time and frequency components. These indices were subjectively evaluated in the eight noisy environments included in the corpus and environments for noisy speech recognition 4 (CENSREC-4) dataset using the familiarity-controlled word lists 2007 (FW07) as the speech data for the intelligibility evaluations. The results of the subjective evaluation of the four indices were then used to train predictive intelligibility estimation models. We evaluated the model performance using cross validation, which involved repeated training of seven of the eight environments and predicting the speech intelligibility under the remaining one environment. In the simulation results, the prediction accuracy of the SIMI index was significantly higher than that of the other indices, with a root mean squared error of 0.160 and a correlation coefficient of 0.934

Muroran-IT Academic Resource Archive

Data-Driven Speech Intelligibility Prediction

Author: Pedersen Mathias
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2023
Field of study

VBN