Search CORE

115 research outputs found

Speaker Recognition: Advancements and Challenges

Author: Homayoon Beigi
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

DNN-based speaker clustering for speaker diarisation

Author: Hain T.
Milner R.
Publication venue: 'International Speech Communication Association'
Publication date: 08/09/2016
Field of study

Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and nonspeech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%

Crossref

White Rose Research Online

Maximum likelihood Linear Programming Data Fusion for Speaker Recognition

Author: Atal
Bellings
Bertsimas
Bishop
Boer
Boyd
Cover
Enric Monte-Moreno
Faundez-Zanuy
Faundez-Zanuy
Faundez-Zanuy
Faundez-Zanuy
Faundez-Zanuy
Faundez-Zanuy
Jacoviti
Jordi Sole-Casals
Kuncheva
Mahadeva Prasanna
Marcos Faundez-Zanuy
Mohamed Chetouani
Nikias
Nikias
Ortega-García
Prakriya
Reynolds
Solé-Casals
Solé-Casals
Solé-Casals
Taleb
Taleb
Thevenaz
Zheng
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Biometric system performance can be improved by means of data fusion. Several kinds of information can be fused in order to obtain a more accurate classification (identification or verification) of an input sample. In this paper we present a method for computing the weights in a weighted sum fusion for score combinations, by means of a likelihood model. The maximum likelihood estimation is set as a linear programming problem. The scores are derived from a GMM classifier working on a different feature extractor. Our experimental results assesed the robustness of the system in front a changes on time (different sessions) and robustness in front a change of microphone. The improvements obtained were significantly better (error bars of two standard deviations) than a uniform weighted sum or a uniform weighted product or the best single classifier. The proposed method scales computationaly with the number of scores to be fussioned as the simplex method for linear programming

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RIUVic

Combining evidence from subsegmental and segmental features for audio clip classification

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Biologically inspired speaker verification

Author: Tashan T
Publication venue
Publication date: 01/01/2012
Field of study

Speaker verification is an active research problem that has been addressed using a variety of different classification techniques. However, in general, methods inspired by the human auditory system tend to show better verification performance than other methods. In this thesis three biologically inspired speaker verification algorithms are presented

Nottingham Trent Institutional Repository (IRep)

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Author: Castán Diego
Delgado Héctor
Docío-Fernández Laura
Franco-Pedroso Javier
Lleida Eduardo
Lopez-Otero Paula
Navas Eva
Ortega Alfonso R.
Ramos Daniel
Serrano Javier
Tavárez David E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0076-3Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the metric, the systems and the results for the Albayzín-2014 audio segmentation campaign. In contrast to previous evaluations where the task was the segmentation of non-overlapping classes, Albayzín-2014 evaluation proposes the delimitation of the presence of speech, music and/or noise that can be found simultaneously. The database used in the evaluation was created by fusing different media and noises in order to increase the difficulty of the task. Seven segmentation systems from four different research groups were evaluated and combined. Their experimental results were analyzed and compared with the aim of providing a benchmark and showing up the promising directions in this field.This work has been partially funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02 and supported by the European Regional Development Fund and the Spanish Government (‘SpeechTech4All Project’ TEC2012-38939-C03

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Robust speaker identification using artificial neural networks

Author: Sivathanu Pillai Madhavan
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2006
Field of study

This research mainly focuses on recognizing the speakers through their speech samples. Numerous Text-Dependent or Text-Independent algorithms have been developed by people so far, to recognize the speaker from his/her speech. In this thesis, we concentrate on the recognition of the speaker from the fixed text i.e. Text-Dependent . Possibility of extending this method to variable text i.e. Text-Independent is also analyzed. Different feature extraction algorithms are employed and their performance with Artificial Neural Networks as a Data Classifier on a fixed training set is analyzed. We find a way to combine all these individual feature extraction algorithms by incorporating their interdependence. The efficiency of these algorithms is determined after the input speech is classified using Back Propagation Algorithm of Artificial Neural Networks. A special case of Back Propagation Algorithm which improves the efficiency of the classification is also discussed

University of Nevada, Las Vegas Repository