Search CORE

62 research outputs found

Adaptive speaker diarization of broadcast news based on factor analysis

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The introduction of factor analysis techniques in a speaker diarization system enhances its performance by facilitating the use of speaker specific information, by improving the suppression of nuisance factors such as phonetic content, and by facilitating various forms of adaptation. This paper describes a state-of-the-art iVector-based diarization system which employs factor analysis and adaptation on all levels. The diarization modules relevant for this work are: the speaker segmentation which searches for speaker boundaries and the speaker clustering which aims at grouping speech segments of the same speaker. The speaker segmentation relies on speaker factors which are extracted on a frame-by-frame basis using eigenvoices. We incorporate soft voice activity detection in this extraction process as the speaker change detection should be based on speaker information only and we want it to disregard the non-speech frames by applying speech posteriors. Potential speaker boundaries are inserted at positions where rapid changes in speaker factors are witnessed. By employing Mahalanobis distances, the effect of the phonetic content can be further reduced, which results in more accurate speaker boundaries. This iVector-based segmentation significantly outperforms more common segmentation methods based on the Bayesian Information Criterion (BIC) or speech activity marks. The speaker clustering employs two-step Agglomerative Hierarchical Clustering (AHC): after initial BIC clustering, the second cluster stage is realized by either an iVector Probabilistic Linear Discriminant Analysis (PLDA) system or Cosine Distance Scoring (CDS) of extracted speaker factors. The segmentation system is made adaptive on a file-by-file basis by iterating the diarization process using eigenvoice matrices adapted (unsupervised) on the output of the previous iteration. Assuming that for most use cases material similar to the recording in question is readily available, unsupervised domain adaptation of the speaker clustering is possible as well. We obtain this by expanding the eigenvoice matrix used during speaker factor extraction for the CDS clustering stage with a small set of new eigenvoices that, in combination with the initial generic eigenvoices, models the recurring speakers and acoustic conditions more accurately. Experiments on the COST278 multilingual broadcast news database show the generation of significantly more accurate speaker boundaries by using adaptive speaker segmentation which also results in more accurate clustering. The obtained speaker error rate (SER) can be further reduced by another 13% relative to 7.4% via domain adaptation of the CDS clustering. (C) 2017 Elsevier Ltd. All rights reserved

Ghent University Academic Bibliography

Factor analysis for speaker segmentation and improved speaker diarization

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2015
Field of study

Speaker diarization includes two steps: speaker segmentation and speaker clustering. Speaker segmentation searches for speaker boundaries, whereas speaker clustering aims at grouping speech segments of the same speaker. In this work, the segmentation is improved by replacing the Bayesian Information Criterion (BIC) with a new iVector-based approach. Unlike BIC-based methods which trigger on any acoustic dissimilarities, the proposed method suppresses phonetic variations and accentuates speaker differences. More specifically our method generates boundaries based on the distance between two speaker factor vectors that are extracted on a frame-by frame basis. The extraction relies on an eigenvoice matrix so that large differences between speaker factor vectors indicate a different speaker. A Mahalanobis-based distance measure, in which the covariance matrix compensates for the remaining and detrimental phonetic variability, is shown to generate accurate boundaries. The detected segments are clustered by a state-of-the-art iVector Probabilistic Linear Discriminant Analysis system. Experiments on the COST278 multilingual broadcast news database show relative reductions of 50% in boundary detection errors. The speaker error rate is reduced by 8% relative

Ghent University Academic Bibliography

Soft VAD in factor analysis based speaker segmentation of broadcast news

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2016
Field of study

Crossref

Ghent University Academic Bibliography

Adaptive speaker diarization of broadcast news based on factor analysis

Author: Bansé
Bell
Brecht Desplanques
Brummer
Burget
Carletta
Castaldo
Chen
Dehak
Desplanques
Desplanques
Desplanques
Desplanques
Ferrer
Ferrer
Fox
Garcia-Romero
Garcia-Romero
Garofolo
Glembek
Jean-Pierre Martens
Karanasou
Kenny
Kenny
Kenny
Kris Demuynck
Le Lan
McLaren
Ning
Pelecanos
Reynolds
Sell
Sell
Sell
Senoussaoui
Shum
Shum
Silovský
Silovský
Stafylakis
Tang
Vandecatseye
Vandecatseye
Verwimp
Zelenák
Zhu
Žibert
Žibert
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Loquendo - Politecnico di Torino’s 2008 NIST Speaker Recognition Evaluation System

Author: CASTALDO F
COLIBRO D
COLIBRO D
DALMASSO E
LAFACE P.
VAIR C
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A Review of Recent Advances in Speaker Diarization with Bayesian Methods

Author: Themos Stafylakis
Vassilis Katsouros
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

IntechOpen