Search CORE

46 research outputs found

Speaker Diarization Based on Intensity Channel Contribution

Author: Barra Chicote Roberto
Ferreiros López Javier
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Data Fusion based on Game Theory for Speaker Diarization

Author: Alvarez Garcia Federico
Barrilero Gil Marta
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/07/2012
Field of study

A novel algorithm based on bimatrix game theory has been developed to improve the accuracy and reliability of a speaker diarization system. This algorithm fuses the output data of two open-source speaker diarization programs, LIUM and SHoUT, taking advantage of the best properties of each one. The performance of this new system has been tested by means of audio streams from several movies. From preliminary results on fragments of five movies, improvements of 63% in false alarms and missed speech mistakes have been achieved with respect to LIUM and SHoUT systems working alone. Moreover, we also improve in a 20% the number of recognized speakers, getting close to the real number of speakers in the audio strea

Archivo Digital UPM

Linguistic influences on bottom-up and top-down clustering for speaker diarization

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS

Author: Bourlard Hervé
Madikeri Srikanth
Publication venue: Idiap
Publication date: 19/06/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

New experiments on speaker diarization for unsupervised speaking style voice building for speech synthesis

Author: Echeverry Correa Julian David
Martínez González Beatriz
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Influence of transition cost in the segmentation stage of speaker diarization

Author: Martínez González Beatriz
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2016
Field of study

In any speaker diarization system there is a segmentation phase and a clustering phase. Our system uses them in a single step in which segmentation and clustering are used iteratively until certain condition is met. In this paper we propose an improvement of the segmentation method that cancels a penalization that had been applied in previous works to any transition between speakers. We also study the performance when transitions between speakers are favoured instead of penalized. This last option achieves better results both for the development set (21.65 % relative speaker error improvementSER) and for the test set (4.60% relative speaker error improvement

Archivo Digital UPM

COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION

Author: Bourlard Hervé
Madikeri Srikanth
Motlicek Petr
Publication venue: Idiap
Publication date: 19/06/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref