Deep neural network Based Low-latency Speech Separation with Asymmetric
  analysis-Synthesis Window Pair

Naithani, Gaurav; Politis, Archontis; Virtanen, Tuomas; Wang, Shanshan

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

Authors: Gaurav Naithani
Archontis Politis
Tuomas Virtanen
Shanshan Wang
Publication date: 1 January 2021
Publisher
Doi

Abstract

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with both speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.Comment: Accepted to EUSIPCO-202

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Trepo - Institutional Repository of Tampere University

oai:trepo.tuni.fi:10024/137826

Last time updated on 22/08/2022