Overlap detection for speaker diarization by fusing spectral and spatial features

Abstract

A substantial portion of errors of the conventional speaker diarization systems on meeting data can be accounted to overlapped speech. This paper proposes the use of several spatial features to improve speech overlap detection on distant channel microphones. These spatial features are integrated into a spectral-based system by using principal component analysis and neural networks. Different overlap detection hypotheses are used to improve diarization performance with both overlap exclusion and overlap labeling. In experiments conducted on AMI Meeting Corpus we demonstrate a relative DER improvement of 11.6% and 14.6% for single- and multi-site data, respectively

    Similar works

    Full text

    thumbnail-image

    Available Versions