Mask estimation based on sound localisation for missing data speech recognition

Guy J Brown; Jon Barker; Sue Harding

Mask estimation based on sound localisation for missing data speech recognition

Authors: Guy J Brown
Jon Barker
Sue Harding
Publication date: 1 January 2005
Publisher

Abstract

ABSTRACT This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with 'missing data' techniques for robust speech recognition in noise. Missing data time-frequency masks are produced using cross-correlation to estimate interaural time differenre (ITD) and hence spatial azimuth; this is used to determine which regions of the signal constitute reliable evidence of the target speech signal. Three experiments are performed that compare the effects of different reverberation surfaces, localisation methods and azimuth separations on recognition accuracy, together with the effects of two post-processing techniques (morphological operations and supervised learning) for improving mask estimation. Both post-processing techniques greatly improve performance; the best performance occurs using a learnt mapping

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1037....

Last time updated on 07/12/2020