Search CORE

48,059 research outputs found

Using data-driven and phonetic units for speaker verification

Author: El Hannani Asmaa
Hennebert Jean
Montero-Asenjo Alberto
Petrovska-Delacrétaz Dijana
Toledano Doroteo T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. E. Hannani, D. T. Toledano, D. Petrovska-Delacrétaz, A. Montero-Asenjo, J. Hennebert, "Using Data-driven and Phonetic Units for Speaker Verification" in Odyssey: The Speaker and Language Recognition Workshop, San Juan (Puerto Rico), 2006, pp.1 - 6Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approache

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Author: Shahin Ismail
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people talking tone like happiness, anger, and sadness. Such emotions are directly affected by the patient health status. In neutral talking environments, speakers can be easily verified, however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker emotion cues (text-independent and emotion-dependent speaker verification problem) based on both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is comprised of two cascaded stages that combines and integrates emotion recognizer and speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.Comment: Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, De Gruyter, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Synthesis using speaker adaptation from speech recognition DB

Author: Bonafonte Cávez Antonio
Moreno Bilbao M. Asunción
Oller Moreno Sergio
Publication venue: Universidad de Vigo
Publication date: 01/01/2010
Field of study

This paper deals with the creation of multiple voices from a Hidden Markov Model based speech synthesis system (HTS). More than 150 Catalan synthetic voices were built using Hidden Markov Models (HMM) and speaker adaptation techniques. Training data for building a Speaker-Independent (SI) model were selected from both a general purpose speech synthesis database (FestCat;) and a database design ed for training Automatic Speech Recognition (ASR) systems (Catalan SpeeCon database). The SpeeCon database was also used to adapt the SI model to different speakers. Using an ASR designed database for TTS purposes provided many different amateur voices, with few minutes of recordings not performed in studio conditions. This paper shows how speaker adaptation techniques provide the right tools to generate multiple voices with very few adaptation data. A subjective evaluation was carried out to assess the intelligibility and naturalness of the generated voices as well as the similarity of the adapted voices to both the original speaker and the average voice from the SI model.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC