Search CORE

3,558 research outputs found

Speaker Diarization Based on Intensity Channel Contribution

Author: Barra Chicote Roberto
Ferreiros López Javier
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Speaker segmentation and clustering

Author: Ajmera
Ajmera
Almpanidis
Barras
Bimbot
Campbell
Campbell
Cettolo
Constantine Kotropoulos
Delacourt
Deller
Fiscus
Gales
Garofolo
Godfrey
Graff
Graff
Graff
Hansen
Harb
Hess
Huang
Jain
Kim
Know
Lapidot
Lu
Manjunath
Margarita Kotti
Meignier
Oppenheim
Pellom
Reynolds
Sondhi
Tranter
Vassiliki Moschou
Ververidis
Wang
Wu
Wu
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Jitter and Shimmer measurements for speaker diarization

Author: Hernando Pericás Francisco Javier
Luque Jordi
Zewoudie Abraham Woubie
Publication venue
Publication date: 01/01/2014
Field of study

Jitter and shimmer voice quality features have been successfully used to characterize speaker voice traits and detect voice pathologies. Jitter and shimmer measure variations in the fundamental frequency and amplitude of speaker's voice, respectively. Due to their nature, they can be used to assess differences between speakers. In this paper, we investigate the usefulness of these voice quality features in the task of speaker diarization. The combination of voice quality features with the conventional spectral features, Mel-Frequency Cepstral Coefficients (MFCC), is addressed in the framework of Augmented Multiparty Interaction (AMI) corpus, a multi-party and spontaneous speech set of recordings. Both sets of features are independently modeled using mixture of Gaussians and fused together at the score likelihood level. The experiments carried out on the AMI corpus show that incorporating jitter and shimmer measurements to the baseline spectral features decreases the diarization error rate in most of the recordings.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A Speaker Diarization System for Studying Peer-Led Team Learning Groups

Author: Dubey Harishchandra
Hansen John H. L.
Kaushik Lakshmish
Sangwan Abhijeet
Publication venue
Publication date: 22/06/2016
Field of study

Peer-led team learning (PLTL) is a model for teaching STEM courses where small student groups meet periodically to collaboratively discuss coursework. Automatic analysis of PLTL sessions would help education researchers to get insight into how learning outcomes are impacted by individual participation, group behavior, team dynamics, etc.. Towards this, speech and language technology can help, and speaker diarization technology will lay the foundation for analysis. In this study, a new corpus is established called CRSS-PLTL, that contains speech data from 5 PLTL teams over a semester (10 sessions per team with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a LENA device (portable audio recorder) that provides multiple audio recordings of the event. Our proposed solution is unsupervised and contains a new online speaker change detection algorithm, termed G 3 algorithm in conjunction with Hausdorff-distance based clustering to provide improved detection accuracy. Additionally, we also exploit cross channel information to refine our diarization hypothesis. The proposed system provides good improvements in diarization error rate (DER) over the baseline LIUM system. We also present higher level analysis such as the number of conversational turns taken in a session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San Francisco, US

arXiv.org e-Print Archive

Crossref