1 research outputs found
Online Self-Attentive Gated RNNs for Real-Time Speaker Separation
Deep neural networks have recently shown great success in the task of blind
source separation, both under monaural and binaural settings. Although these
methods were shown to produce high-quality separations, they were mainly
applied under offline settings, in which the model has access to the full input
signal while separating the signal. In this study, we convert a non-causal
state-of-the-art separation model into a causal and real-time model and
evaluate its performance under both online and offline settings. We compare the
performance of the proposed model to several baseline methods under anechoic,
noisy, and noisy-reverberant recording conditions while exploring both monaural
and binaural inputs and outputs. Our findings shed light on the relative
difference between causal and non-causal models when performing separation. Our
stateful implementation for online separation leads to a minor drop in
performance compared to the offline model; 0.8dB for monaural inputs and 0.3dB
for binaural inputs while reaching a real-time factor of 0.65. Samples can be
found under the following link:
https://kwanum.github.io/sagrnnc-stream-results/.Comment: Appears at the Workshop on Machine Learning in Speech and Language
Processing 202