1 research outputs found

    Joint Speaker Segmentation, Localization and Identification for Streaming Audio

    No full text
    In this paper we investigate the problem of identifying and localizing speakers with distant microphone arrays, thus extending the classical speaker diarization task to answer the question “who spoke when and where”. We consider a streaming audio scenario, where the diarization output is to be generated in realtime with as low latency as possible. Rather than carrying out the individual segmentation and classification tasks (speech detection, change detection, gender/speaker classification) sequentially, we propose a simultaneous segmentation and classification by applying a Viterbi decoder. It uses a transition matrix estimated online from position information and speaker change hypotheses, instead of fixed transition probabilites. This avoids early hard decisions and is shown to outperform the sequential approach. Index Terms: speaker diarization, acoustic scene analysis, Viterbi decode
    corecore