Search CORE

4 research outputs found

Channel variability synthesis in i-vector speaker recognition

Author: Ahmed Ahmed Isam
Becerra Victor
Chiverton John
Ndzi David
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2017
Field of study

Crossref

Portsmouth University Research Portal (Pure)

Research Repository and Portal - University of the West of Scotland

A study of speech distortion conditions in real scenarios for speech processing applications

Author: Calvo José,
Ribas Dayana
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 13/12/2016
Field of study

International audienceThe growing demand for robust speech processing applications able to operate in adverse scenarios calls for new evaluation protocols and datasets beyond artificial laboratory conditions. The characteristics of real data for a given scenario are rarely discussed in the literature. As a result, methods are often tested based on the author expertise and not always in scenarios with actual practical value. This paper aims to open this discussion by identifying some of the main problems with data simulation or collection procedures used so far and summarizing the important characteristics of real scenarios to be taken into account, including the properties of reverberation, noise and Lombard effect. At last, we provide some preliminary guidelines towards designing experimental setup and speech recognition results for proposal validation

INRIA a CCSD electronic archive server

HAL-Rennes 1

Full multicondition training for robust i-vector based speaker recognition

Author: Calvo José Ramon
Ribas Dayana
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 06/09/2015
Field of study

International audienceMulticondition training (MCT) is an established technique to handle noisy and reverberant conditions. Previous works in the field of i-vector based speaker recognition have applied MCT to linear discriminant analysis (LDA) and probabilistic LDA (PLDA), but not to the universal background model (UBM) and the total variability (T) matrix, arguing that this would be too much time consuming due to the increase of the size of the training set by the number of noise and reverberation conditions. In this paper, we propose a full MCT approach which consists of applying MCT in all stages of training, including the UBM and the T matrix, while keeping the size of the training set fixed. Experiments in highly nonstationary noise conditions show a decrease of the equal error rate (EER) to 14.16% compared to 17.90% for clean training and 18.08% for MCT of LDA and PLDA only. We also evaluate the impact of state-of-the-art multichannel speech enhancement and show further reduction of the EER down to 10.47%

INRIA a CCSD electronic archive server

HAL-Rennes 1

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)