Search CORE

88,603 research outputs found

Recommended from our members

Systematic comparison of BIC-based speaker segmentation systems

Author: Benetos E.
Kotropoulos C.
Kotti M.
Moschou V.
Publication venue
Publication date: 01/01/2007
Field of study

Unsupervised speaker change detection is addressed in this paper. Three speaker segmentation systems are examined. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic fusion scheme, and applies the Bayesian Information Criterion (BIC). The second system consists of three modules. In the first module, a second-order statistic-measure is extracted; the Euclidean distance and the T2 Hotelling statistic are applied sequentially in the second module; and BIC is utilized in the third module. The third system, first uses a metric-based approach, in order to detect potential speaker change points, and then the BIC criterion is applied to validate the previously detected change points. Experiments are carried out on a dataset, which is created by concatenating speakers from the TIMIT database. A systematic performance comparison among the three systems is carried out by means of one-way ANOVA method and post hoc Tukey’s method

City Research Online

Crossref

Spiral - Imperial College Digital Repository

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

Author: Abdolali Behrouz
Sameti Hossein
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 01/04/2012
Field of study

Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based method, while the average accuracy of pitch-based method is slightly higher than that of the BIC-based method.Comment: 14 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Speaker change detection using BIC: a comparison on two datasets

Author: Constantine Kotropoulos
Emmanouil Benetos
Luís Gustavo
Margarita Kotti
P. M. Martins
Publication venue
Publication date: 01/01/2006
Field of study

Abstract — This paper addresses the problem of unsupervised speaker change detection. We assume that there is no prior knowledge on the number of speakers or their identities. Two methods are tested. The first method uses the Bayesian Information Criterion (BIC), investigates the AudioSpectrumCentroid and AudioWaveformEnvelope features, and implements a dynamic thresholding followed by a fusion scheme. The second method is a real-time one that uses a metric-based approach employing line spectral pairs (LSP) and the BIC criterion to validate a potential change point. The experiments are carried out on two different datasets. The first set was created by concatenating speakers from the TIMIT database and is referred to as the TIMIT data set. The second set was created by using recordings from the MPEG-7 test set CD1 and broadcast news and is referred to as the INESC dataset. I

CiteSeerX

Spiral - Imperial College Digital Repository

Encoder-decoder multimodal speaker change detection

Author: Heo Hee-Soo
Jung Jee-weon
Kim Geonmin
Kim You Jin
Kwon Young-ki
Lee Bong-Jin
Lee Minjae
Seo Soonshin
Publication venue
Publication date: 01/06/2023
Field of study

The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription.Comment: 5 pages, accepted for presentation at INTERSPEECH 202

arXiv.org e-Print Archive

Echo Cancellation - A Likelihood Ratio Test for Double-talk Versus Channel Change

Author: Bershad Neil J.
Tourneret Jean-Yves
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Echo cancellers are in wide use in both electrical (four wire to two wire mismatch) and acoustic (speaker-microphone coupling) applications. One of the main design problems is the control logic for adaptation. Basically, the algorithm weights should be frozen in the presence of double-talk and adapt quickly in the absence of double-talk. The control logic can be quite complicated since it is often not easy to discriminate between the echo signal and the near-end speaker. This paper derives a log likelihood ratio test (LRT) for deciding between double-talk (freeze weights) and a channel change (adapt quickly) using a stationary Gaussian stochastic input signal model. The probability density function of a sufficient statistic under each hypothesis is obtained and the performance of the test is evaluated as a function of the system parameters. The receiver operating characteristics (ROCs) indicate that it is difficult to correctly decide between double-talk and a channel change based upon a single look. However, post-detection integration of approximately one hundred sufficient statistic samples yields a detection probability close to unity (0.99) with a small false alarm probability (0.01)

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches

Author: Benetos E.
Cardoso J. S.
Kotropoulos C.
Kotti M.
Martins L. P. M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper addresses the problem of unsupervised speaker change detection. Three systems based on the Bayesian Information Criterion (BIC) are tested. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic thresholding followed by a fusion scheme, and finally applies BIC. The second method is a real-time one that uses a metric-based approach employing the line spectral pairs and the BIC to validate a potential speaker change point. The third method consists of three modules. In the first module, a measure based on second-order statistics is used; in the second module, the Euclidean distance and T2 Hotelling statistic are applied; and in the third module, the BIC is utilized. The experiments are carried out on a dataset created by concatenating speakers from the TIMIT database, that is referred to as the TIMIT data set. A comparison between the performance of the three systems is made based on t-statistics

CiteSeerX

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Hierarchical RNN with Static Sentence-Level Attention for Text-Based Speaker Change Detection

Author: Jin Zhi
Meng Zhao
Mou Lili
Publication venue
Publication date: 28/09/2018
Field of study

Speaker change detection (SCD) is an important task in dialog modeling. Our paper addresses the problem of text-based SCD, which differs from existing audio-based studies and is useful in various scenarios, for example, processing dialog transcripts where speaker identities are missing (e.g., OpenSubtitle), and enhancing audio SCD with textual information. We formulate text-based SCD as a matching problem of utterances before and after a certain decision point; we propose a hierarchical recurrent neural network (RNN) with static sentence-level attention. Experimental results show that neural networks consistently achieve better performance than feature-based approaches, and that our attention-based model significantly outperforms non-attention neural networks.Comment: In Proceedings of the ACM on Conference on Information and Knowledge Management (CIKM), 201

arXiv.org e-Print Archive

Crossref

Сегментація мовних голосових сигналів за ознакою зміни диктора

Author: Загваздін О.С.
Крак Ю.В.
Кривонос Ю.Г.
Єфімов Г.М.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2011
Field of study

Запропоновано підхід до сегментації голосових мовних сигналів за ознакою зміни диктора та способи визначення позицій зміни диктора в голосовому мовному сигналі. Позиції зміни диктора визначаються за допомогою аналізу множин характеристичних векторів в околі паузи на основі Байєсівського інформаційного критерію. Покращення якості характеристичних векторів досягається за допомогою використання сегментів з рівнем енергії не нижче певного порогу. Також пропонується адаптивний підхід для автоматичного визначення пауз у мовному сигналі.An approach to the segmentation of speech signals based on the speaker change, as well as to the detection of the speaker change positions in a speech signal is suggested. Speaker change positions are determined by analyzing the sets of characteristic vectors at the pause within the signal based on the Bayesian information criterion. Improvement in quality of the characteristic vectors is achieved by taking into account only the segments with the log energy above the given threshold. It is also suggested an approach for adaptive automatic pause detection in speech signal

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Unsupervised Speaker Change Detection for Broadcast News Segmentation

Author: Hansen Lars Kai
Jørgensen Kasper Winther
Mølgaard Lasse Lohilahti
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents a speaker change detection system for news broadcast segmentation based on a vector quantization (VQ) approach. The system does not make any assumption about the number of speakers or speaker identity. The system uses mel frequency cepstral coefficients and change detection is done using the VQ distortion measure and is evaluated against two other statistics, namely the symmetric Kullback-Leibler (KL2) distance and the so-called ‘divergence shape distance’. First level alarms are further tested using the VQ distortion. We find that the false alarm rate can be reduced without significant losses in the detection of correct changes. We furthermore evaluate the generalizability of the approach by testing the complete system on an independent set of broadcasts, including a channel not present in the training set. 1

CiteSeerX

Online Research Database In Technology