Search CORE

9 research outputs found

Speaker recognition using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Publication venue: FONDAZIONE UGO BORDONI
Publication date: 01/01/1999
Field of study

The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Reconocimiento del locutor mediante filtrado frecuencial de energías espectrales estimadas por métodos híbridos

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue: 'Universidad Politecnica de Madrid - University Library'
Publication date: 01/01/2000
Field of study

Se han explorado dos formas de obtener parámetros más robustos para reconocimiento del locutor: la hibridación de técnicas de análisis espectral y el filtrado frecuencial de las energías de las bandas. Se ha comprobado que el filtrado frecuencial constituye una representación eficiente en reconocimiento del habla y puede ecualizar aproximadamente la varianza cepstral, realzando las oscilaciones espectrales más efectivas para la discriminación entre locutores. Se han obtenido buenos resultados de identificación sobre la base de datos TIMIT, especialmente cuando se ha añadido ruido blanco. Por otro lado, se ha explorado la hibridación de la predicción lineal y el banco de filtros en la etapa de análisis espectral. La combinación de estas técnicas ha proporcionado buenos resultados de verificación sobre la base de datos telefónica POLYCOST.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Optimization of data-driven filterbank for automatic speaker verification

Author: Saha Goutam
Sahidullah Md
Sarangi Susanta
Publication venue: 'Elsevier BV'
Publication date: 21/07/2020
Field of study

Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data. First, we introduce a frame-selection based approach for developing speech-signal-based frequency warping scale. Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA). The main advantage of the proposed method over the recently introduced deep learning based methods is that it requires very limited amount of unlabeled speech-data. We demonstrate that the proposed filterbank has more speaker discriminative power than commonly used mel filterbank as well as existing data-driven filterbank. We conduct automatic speaker verification (ASV) experiments with different corpora using various classifier back-ends. We show that the acoustic features created with proposed filterbank are better than existing mel-frequency cepstral coefficients (MFCCs) and speech-signal-based frequency cepstral coefficients (SFCCs) in most cases. In the experiments with VoxCeleb1 and popular i-vector back-end, we observe 9.75% relative improvement in equal error rate (EER) over MFCCs. Similarly, the relative improvement is 4.43% with recently introduced x-vector system. We obtain further improvement using fusion of the proposed method with standard MFCC-based approach.Comment: Published in Digital Signal Processing journal (Elsevier

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A non-linear polynomial approximation filter for robust speaker verification

Author: Mothae Limpho
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

Bibliography: leaves 101-109

Cape Town University OpenUCT

Hierachical methods for large population speaker identification using telephone speech

Author: Lerato Lerato
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

Cape Town University OpenUCT

Automatic speaker recognition: modelling, feature extraction and effects of clinical environment

Author: Memon S
Publication venue: RMIT University
Publication date: 01/01/2010
Field of study

Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced

RMIT Research Repository

Optimizing spectral feature based text-Independent speaker recognition

Author: Kinnunen Tomi H.
Publication venue: University of Joensuu
Publication date
Field of study

UEF Electronic Publications

Speaker verification on the polycost database using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue
Publication date: 01/01/1998
Field of study

The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a first or second order FIR filter have proved to be competitive for speech recognition. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using mel-cepstrum were observed on the TIMIT database, especially when white noise was added. In this paper, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering is explored for speaker verification. This combination, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POL YCOST database.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Speaker verification on the polycost database using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Nadeu Camprubí Climent
Publication venue
Publication date
Field of study

RECERCAT