ASR-based, single-ended modeling of listening effort - a tool for TV sound engineers

Abstract

This paper reviews our research approaches towards a listening effort model and its applications as a tool to automatically measure and display the perceived listening effort required to understand speech in a variety of different background sounds. It is single-ended, i.e. it does not require a clean speech reference, and is based on an automatic speech recognition (ASR) system. Speech distortions and interfering background sounds increase the uncertainty of the ASR system, which can be quantified and mapped to a perceptually interpretable scale using a psychoacoustic modeling approach. This performance measure correlates well with mean subjective listening effort ratings for a variety of distortions and acoustic backgrounds typical for TV broadcast material (r > 0.9). In principle, the tool is applicable to be integrated as a software plugin for digital audio workstations (DAWs) to support the work of sound engineers, or in other applications such as speech quality monitoring of communication channels or real-time control of signal-enhancement algorithms

    Similar works

    Full text

    thumbnail-image

    Available Versions