Factor analysis for speaker segmentation and improved speaker diarization

Demuynck, Kris; Desplanques, Brecht; Martens, Jean-Pierre

unknown

Factor analysis for speaker segmentation and improved speaker diarization

Authors: Kris Demuynck
Brecht Desplanques
Jean-Pierre Martens
Publication date: 1 January 2015
Publisher: International Speech Communication Association (ISCA)

Abstract

Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Ghent University Academic Bibliography

oai:archive.ugent.be:7159960

Last time updated on 12/11/2016