Towards Improved Room Impulse Response Estimation for Speech Recognition

Ananthabhotla, Ishwarya; Calamia, Paul; Hoffmann, Pablo; Ithapu, Vamsi Krishna; Manocha, Dinesh; Ratnarajah, Anton

Towards Improved Room Impulse Response Estimation for Speech Recognition

Authors: Ishwarya Ananthabhotla
Paul Calamia
Pablo Hoffmann
Vamsi Krishna Ithapu
Dinesh Manocha
Anton Ratnarajah
Publication date: 7 November 2022
Publisher

Abstract

We propose to characterize and improve the performance of blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a GAN-based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 72% on the energy decay relief and 22% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9% in word error rate)

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2211.04473

Last time updated on 12/12/2022