Time-frequency masking or spectrum prediction computed via short symmetric
windows are commonly used in low-latency deep neural network (DNN) based source
separation. In this paper, we propose the usage of an asymmetric
analysis-synthesis window pair which allows for training with targets with
better frequency resolution, while retaining the low-latency during inference
suitable for real-time speech enhancement or assisted hearing applications. In
order to assess our approach across various model types and datasets, we
evaluate it with both speaker-independent deep clustering (DC) model and a
speaker-dependent mask inference (MI) model. We report an improvement in
separation performance of up to 1.5 dB in terms of source-to-distortion ratio
(SDR) while maintaining an algorithmic latency of 8 ms.Comment: Accepted to EUSIPCO-202