10 research outputs found

    Listening and grouping: an online autoregressive approach for monaural speech separation

    Get PDF
    This paper proposes an autoregressive approach to harness the power of deep learning for multi-speaker monaural speech separation. It exploits a causal temporal context in both mixture and past estimated separated signals and performs online separation that is compatible with real-time applications. The approach adopts a learned listening and grouping architecture motivated by computational auditory scene analysis, with a grouping stage that effectively addresses the label permutation problem at both frame and segment levels. Experimental results on the benchmark WSJ0-2mix dataset show that the new approach can outperform the majority of state-of-the-art methods in both closed-set and open-set conditions in terms of signal-to-distortion ratio (SDR) improvement and perceptual evaluation of speech quality (PESQ), even approaches that exploit whole-utterance statistics for separation, with relatively fewer model parameters
    corecore