Search CORE

912 research outputs found

Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

Author: Huemmer Christian
Kellermann Walter
Maas Roland
Schwarz Andreas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/02/2015
Field of study

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201

arXiv.org e-Print Archive

Crossref

A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

Author: Huemmer Christian
Kellermann Walter
Maas Roland
Sehr Armin
Publication venue
Publication date: 22/09/2014
Field of study

This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

arXiv.org e-Print Archive