15 research outputs found
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
This paper introduces a new method for multi-channel time domain speech
separation in reverberant environments. A fully-convolutional neural network
structure has been used to directly separate speech from multiple microphone
recordings, with no need of conventional spatial feature extraction. To reduce
the influence of reverberation on spatial feature extraction, a dereverberation
pre-processing method has been applied to further improve the separation
performance. A spatialized version of wsj0-2mix dataset has been simulated to
evaluate the proposed system. Both source separation and speech recognition
performance of the separated signals have been evaluated objectively.
Experiments show that the proposed fully-convolutional network improves the
source separation metric and the word error rate (WER) by more than 13% and 50%
relative, respectively, over a reference system with conventional features.
Applying dereverberation as pre-processing to the proposed system can further
reduce the WER by 29% relative using an acoustic model trained on clean and
reverberated data.Comment: Presented at IEEE ICASSP 202
Task-Aware Separation for the DCASE 2020 Task 4 Sound Event Detection and Separation Challenge
International audienceSource Separation is often used as a pre-processing step in many signal-processing tasks. In this work we propose a novel approach for combined Source Separation and Sound Event Detection in which a Source Separation algorithm is used to enhance the Sound Even-Detection back-end performance. In particular, we present a permutation-invariant training scheme for optimizing the Source Separation system directly with the back-end Sound Event Detection objective without requiring joint training or fine-tuning of the two systems. We show that such an approach has significant advantages over the more standard approach of training the Source Separation system separately using only a Source Separation based objective such as Scale-Invariant Signal-To-Distortion Ratio. On the 2020 Detection and Classification of Acoustic Scenes and Events Task 4 Challenge our proposed approach is able to outperform the baseline source separation system by more than one percent in event-based macro F1 score on the development set with significantly less computational requirements