Search CORE

134,413 research outputs found

Speaker recognition under stress conditions

Author: Gallardo Antolín Ascensión
Peláez Moreno Carmen
Rituerto González Esther
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

Proceeding of: IberSPEECH 2018, 21-23 November 2018, Barcelona, SpainSpeaker recognition systems exhibit a decrease in performance when the input speech is not in optimal circumstances, for example when the user is under emotional or stress conditions. The objective of this paper is measuring the effects of stress on speech to ultimately try to mitigate its consequences on a speaker recognition task. On this paper, we develop a stress-robust speaker identification system using data selection and augmentation by means of the manipulation of the original speech utterances. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we concluded that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples, improves the performance of the system.This work is partially supported by the Spanish Government-MinECo projects TEC2014-53390-P and TEC2017-84395-P

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification

Author: Peláez-Moreno Carmen
Rituerto-González Esther
Publication venue
Publication date: 20/07/2020
Field of study

Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and emotions in the speaker. Taking advantage of representation learning, on this paper we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real life environmental noise and make use of a database with real stressed speech. We prove that the joint optimization of both the denoiser and the speaker identification module outperforms independent optimization of both modules under stress and noise distortions as well as hand-crafted features.Comment: 8 pages + 2 of references + 5 of images. Submitted on Monday 20th of July to Elsevier Signal Processing Short Communication

arXiv.org e-Print Archive

Universidad Carlos III de Madrid e-Archivo

Data Augmentation for Speaker Identification under Stress Conditions to Combat Gender-Based Violence

Author: Gallardo Antolín Ascensión
Minguez Sanchez Alba
Peláez Moreno Carmen
Rituerto Gonzalez Esther
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This article belongs to the Special Issue IberSPEECH 2018: Speech and Language Technologies for Iberian LanguagesA Speaker Identification system for a personalized wearable device to combat gender-based violence is presented in this paper. Speaker recognition systems exhibit a decrease in performance when the user is under emotional or stress conditions, thus the objective of this paper is to measure the effects of stress in speech to ultimately try to mitigate their consequences on a speaker identification task, by using data augmentation techniques specifically tailored for this purpose given the lack of data resources for this condition. An extensive experimentation has been carried out for assessing the effectiveness of the proposed techniques. First, we conclude that the best performance is always obtained when naturally stressed samples are included in the training set, and second, when these are not available, their substitution and augmentation with synthetically generated stress-like samples improves the performance of the system.This work is partially supported by the Spanish Government-MinECo project TEC2017-84395-P and Madrid Regional Project Y2018/TCS-5046

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Speaker Recognition System using Wavelet Transform under Stress Condition

Author: Rout Janmejaya
Tanaya Biswal Kshirabdhi
Publication venue: Kakinada Institute of Engineering and Technology for Women
Publication date: 01/01/2017
Field of study

in this paper, we introduced a text-depend speaker recognition by using wavelet transform under stressed conditions. Here we compare different feature such as ARC, LAR, LPCC, MFCC, CEP and after comparison we found that LPCC provides best feature. For decompose signal at two levels Discrete Wavelet Transform is used here. Discrete Wavelet Transform (DWT) based Linear Predictive Cepstral Coefficients (LPCC) used as a feature for recognized the speaker system. For classification Vector Quantization method is used. Four different stressed data has selected for (SUSAS) i.e. stress speech data base for speaker recognition. Improvement is achieved 93% and 94% in case of Lombard and Neutral case

International Journal of Science Engineering and Advance Technology (IJSEAT)

Employing Emotion Cues to Verify Speakers in Emotional Talking Environments

Author: Shahin Ismail
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2016
Field of study

Usually, people talk neutrally in environments where there are no abnormal talking conditions such as stress and emotion. Other emotional conditions that might affect people talking tone like happiness, anger, and sadness. Such emotions are directly affected by the patient health status. In neutral talking environments, speakers can be easily verified, however, in emotional talking environments, speakers cannot be easily verified as in neutral talking ones. Consequently, speaker verification systems do not perform well in emotional talking environments as they do in neutral talking environments. In this work, a two-stage approach has been employed and evaluated to improve speaker verification performance in emotional talking environments. This approach employs speaker emotion cues (text-independent and emotion-dependent speaker verification problem) based on both Hidden Markov Models (HMMs) and Suprasegmental Hidden Markov Models (SPHMMs) as classifiers. The approach is comprised of two cascaded stages that combines and integrates emotion recognizer and speaker recognizer into one recognizer. The architecture has been tested on two different and separate emotional speech databases: our collected database and Emotional Prosody Speech and Transcripts database. The results of this work show that the proposed approach gives promising results with a significant improvement over previous studies and other approaches such as emotion-independent speaker verification approach and emotion-dependent speaker verification approach based completely on HMMs.Comment: Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, De Gruyter, 201

arXiv.org e-Print Archive

Directory of Open Access Journals