13,066 research outputs found
Improving Source Separation via Multi-Speaker Representations
Lately there have been novel developments in deep learning towards solving
the cocktail party problem. Initial results are very promising and allow for
more research in the domain. One technique that has not yet been explored in
the neural network approach to this task is speaker adaptation. Intuitively,
information on the speakers that we are trying to separate seems fundamentally
important for the speaker separation task. However, retrieving this speaker
information is challenging since the speaker identities are not known a priori
and multiple speakers are simultaneously active. There is thus some sort of
chicken and egg problem. To tackle this, source signals and i-vectors are
estimated alternately. We show that blind multi-speaker adaptation improves the
results of the network and that (in our case) the network is not capable of
adequately retrieving this useful speaker information itself
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
We propose a spatial diffuseness feature for deep neural network (DNN)-based
automatic speech recognition to improve recognition accuracy in reverberant and
noisy environments. The feature is computed in real-time from multiple
microphone signals without requiring knowledge or estimation of the direction
of arrival, and represents the relative amount of diffuse noise in each time
and frequency bin. It is shown that using the diffuseness feature as an
additional input to a DNN-based acoustic model leads to a reduced word error
rate for the REVERB challenge corpus, both compared to logmelspec features
extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201
- …