We propose to characterize and improve the performance of blind room impulse
response (RIR) estimation systems in the context of a downstream application
scenario, far-field automatic speech recognition (ASR). We first draw the
connection between improved RIR estimation and improved ASR performance, as a
means of evaluating neural RIR estimators. We then propose a GAN-based
architecture that encodes RIR features from reverberant speech and constructs
an RIR from the encoded features, and uses a novel energy decay relief loss to
optimize for capturing energy-based properties of the input reverberant speech.
We show that our model outperforms the state-of-the-art baselines on acoustic
benchmarks (by 72% on the energy decay relief and 22% on an early-reflection
energy metric), as well as in an ASR evaluation task (by 6.9% in word error
rate)