It is essential to perform speech intelligibility (SI) experiments with human
listeners to evaluate the effectiveness of objective intelligibility measures.
Recently crowdsourced remote testing has become popular to collect a massive
amount and variety of data with relatively small cost and in short time.
However, careful data screening is essential for attaining reliable SI data. We
compared the results of laboratory and crowdsourced remote experiments to
establish an effective data screening technique. We evaluated the SI of noisy
speech sounds enhanced by a single-channel ideal ratio mask (IRM) and
multi-channel mask-based beamformers. The results demonstrated that the SI
scores were improved by these enhancement methods. In particular, the
IRM-enhanced sounds were much better than the unprocessed and other enhanced
sounds, indicating IRM enhancement may give the upper limit of speech
enhancement performance. Moreover, tone pip tests, for which participants were
asked to report the number of audible tone pips, reduced the variability of
crowdsourced remote results so that the laboratory results became similar. Tone
pip tests could be useful for future crowdsourced experiments because of their
simplicity and effectiveness for data screening.Comment: This paper was submitted to Interspeech 2022
(http://www.interspeech2022.org