Search CORE

141,461 research outputs found

Phoneme and sub-phoneme T-normalization for text-dependent speaker recognition

Author: Esteve Elizalde Cristina
González-Rodríguez Joaquín
Toledano Doroteo T.
Publication venue: 'International Speech Communication Association'
Publication date: 21/01/2008
Field of study

Proceedings of Odyssey 2008: The Speaker and Language Recognition Workshop, Stellenbosch, South AfricaTest normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or text prompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based text dependent speaker recognition system. Results are even better for working points with low false acceptance rates.This work was funded by the Spanish Ministry of Science and Technology under project TEC2006-13170-C02-01

Biblos-e Archivo

Word And Speaker Recognition System

Author: TAN SHWU FEI
Publication venue: Universiti Teknologi Petronas
Publication date: 01/01/2010
Field of study

In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%

UTPedia

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

Author: Pietquin Olivier
Preux Philippe
Seurin Mathieu
Strub Florian
Publication venue
Publication date: 07/08/2020
Field of study

Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Phoneme and Sub-Phoneme T-Normalization for Text-Dependent Speaker Recognition

Author: Esteve-Elizalde Cristina
Fernández Pozo Rubén
Gonzalez-Rodriguez Joaquin
Hernández Gómez Luis Alfonso
Torre Toledano Doroteo
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates

CiteSeerX

Archivo Digital UPM