188 research outputs found
An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to
listening tests. This paper evaluates 12 monaural intrusive intelligibility
metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and
. In addition, this paper investigates the ability of
intelligibility metrics to generalize to new types of distortions and analyzes
why the top performing metrics have high performance. The intelligibility data
were obtained from 11 listening tests described in the literature. The stimuli
included Dutch, Danish, and English speech that was distorted by additive
noise, reverberation, competing talkers, pre-processing enhancement, and
post-processing enhancement. SIIB and HASPI had the highest performance
achieving a correlation with listening test scores on average of
and , respectively. The high performance of SIIB may, in part, be
the result of SIIBs developers having access to all the intelligibility data
considered in the evaluation. The results show that intelligibility metrics
tend to perform poorly on data sets that were not used during their
development. By modifying the original implementations of SIIB and STOI, the
advantage of reducing statistical dependencies between input features is
demonstrated. Additionally, the paper presents a new version of SIIB called
, which has similar performance to SIIB and HASPI,
but takes less time to compute by two orders of magnitude.Comment: Published in IEEE/ACM Transactions on Audio, Speech, and Language
Processing, 201
Recommended from our members
Sensorineural hearing loss enhances auditory sensitivity and temporal integration for amplitude modulation.
Amplitude-modulation detection thresholds (AMDTs) were measured at 40 dB sensation level for listeners with mild-to-moderate sensorineural hearing loss (age: 50-64 yr) for a carrier frequency of 500 Hz and rates of 2 and 20 Hz. The number of modulation cycles, N, varied between two and nine. The data were compared with AMDTs measured for young and older normal-hearing listeners [Wallaert, Moore, and Lorenzi (2016). J. Acoust. Soc. Am. 139, 3088-3096]. As for normal-hearing listeners, AMDTs were lower for the 2-Hz than for the 20-Hz rate, and AMDTs decreased with increasing N. AMDTs were lower for hearing-impaired listeners than for normal-hearing listeners, and the effect of increasing N was greater for hearing-impaired listeners. A computational model based on the modulation-filterbank concept and a template-matching decision strategy was developed to account for the data. The psychophysical and simulation data suggest that the loss of amplitude compression in the impaired cochlea is mainly responsible for the enhanced sensitivity and temporal integration of temporal envelope cues found for hearing-impaired listeners. The data also suggest that, for AM detection, cochlear damage is associated with increased internal noise, but preserved short-term memory and decision mechanisms.N.W. was supported by a grant from Neurelec Oticon Medical. C.L. was supported by two grants from ANR (HEARFIN and HEART projects). S.D.E. was supported by Deutsche Forschungsgemeinschaft (DFG) FOR 1732 (TPE). B.C.J.M. was supported by the EPSRC (UK, grant RG78536). This work was also supported by ANR-11-0001-02 PSL* and ANR-10-LABX-0087. We thank Nihaad Paraouty and two anonymous reviewers for helpful comments and suggestions relating to this study
Learning static spectral weightings for speech intelligibility enhancement in noise
Near-end speech enhancement works by modifying speech prior to presentation in a noisy environment, typically operating under a constraint of limited or no increase in speech level. One issue is the extent to which near-end enhancement techniques require detailed estimates of the masking environment to function effectively. The current study investigated speech modification strategies based on reallocating energy statically across the spectrum using masker-specific spectral weightings. Weighting patterns were learned offline by maximising a glimpse-based objective intelligibility metric. Keyword scores in sentences in the presence of stationary and fluctuating maskers increased, in some cases by very substantial amounts, following the application of masker- and SNR-specific spectral weighting. A second experiment using generic masker-independent spectral weightings that boosted all frequencies above 1 kHz also led to significant gains in most conditions. These findings indicate that energy-neutral spectral weighting is a highly-effective near-end speech enhancement approach that places minimal demands on detailed masker estimation
Predicting binaural speech intelligibility from signals estimated by a blind source separation algorithm
State-of-the-art binaural objective intelligibility measures (OIMs) require individual source signals for making intelligibility predictions, limiting their usability in real-time online operations. This limitation may be addressed by a blind source separation (BSS) process, which is able to extract the underlying sources from a mixture. In this study, a speech source is presented with either a stationary noise masker or a fluctuating noise masker whose azimuth varies in a horizontal plane, at two speech-to-noise ratios (SNRs). Three binaural OIMs are used to predict speech intelligibility from the signals separated by a BSS algorithm. The model predictions are compared with listeners' word identification rate in a perceptual listening experiment. The results suggest that with SNR compensation to the BSS-separated speech signal, the OIMs can maintain their predictive power for individual maskers compared to their performance measured from the direct signals. It also reveals that the errors in SNR between the estimated signals are not the only factors that decrease the predictive accuracy of the OIMs with the separated signals. Artefacts or distortions on the estimated signals caused by the BSS algorithm may also be concerns
Auditory sensory saliency as a better predictor of change than sound amplitude in pleasantness assessment of reproduced urban soundscapes
The sonic environment of the urban public space is often experienced while walking through it. Nevertheless, city dwellers are usually not actively listening to the environment when traversing the city. Therefore, sound events that are salient, i.e. stand out of the sonic environment, are the ones that trigger attention and contribute highly to the perception of the soundscape. In a previously reported audiovisual perception experiment, the pleasantness of a recorded urban sound walk was continuously evaluated by a group of participants. To detect salient events in the soundscape, a biologically-inspired computational model for auditory sensory saliency based on spectrotemporal modulations is proposed. Using the data from a sound walk, the present study validates the hypothesis that salient events detected by the model contribute to changes in soundscape rating and are therefore important when evaluating the urban soundscape. Finally, when using the data from an additional experiment without a strong visual component, the importance of auditory sensory saliency as a predictor for change in pleasantness assessment is found to be even more pronounced
- …