9 research outputs found
Speaker recognition using frequency filtered spectral energies
The spectral parameters that result from filtering the
frequency sequence of log mel-scaled filter-bank energies
with a simple first or second order FIR filter have proved
to be an efficient speech representation in terms of both
speech recognition rate and computational load. Recently,
the authors have shown that this frequency filtering can
approximately equalize the cepstrum variance enhancing
the oscillations of the spectral envelope curve that are
most effective for discrimination between speakers. Even
better speaker identification results than using melcepstrum
have been obtained on the TIMIT database,
especially when white noise was added. On the other
hand, the hybridization of both linear prediction and
filter-bank spectral analysis using either cepstral
transformation or the alternative frequency filtering has
been explored for speaker verification. The combination
of hybrid spectral analysis and frequency filtering, that
had shown to be able to outperform the conventional
techniques in clean and noisy word recognition, has yield
good text-dependent speaker verification results on the
new speaker-oriented telephone-line POLYCOST
database.Peer ReviewedPostprint (published version
Reconocimiento del locutor mediante filtrado frecuencial de energías espectrales estimadas por métodos híbridos
Se han explorado dos formas de obtener parámetros más robustos para reconocimiento del locutor: la hibridación de técnicas de análisis espectral y el filtrado frecuencial de las energías de las bandas. Se ha comprobado que el filtrado frecuencial constituye una representación eficiente en reconocimiento del habla y puede ecualizar aproximadamente la varianza cepstral, realzando las oscilaciones espectrales más efectivas para la discriminación entre locutores. Se
han obtenido buenos resultados de identificación sobre la base de datos TIMIT, especialmente cuando se ha añadido ruido blanco. Por otro lado, se ha explorado la hibridación de la predicción lineal y el banco de filtros en la etapa de análisis espectral. La combinación de estas técnicas ha proporcionado buenos resultados de verificación sobre la base de datos telefónica POLYCOST.Peer ReviewedPostprint (published version
Optimization of data-driven filterbank for automatic speaker verification
Most of the speech processing applications use triangular filters spaced in
mel-scale for feature extraction. In this paper, we propose a new data-driven
filter design method which optimizes filter parameters from a given speech
data. First, we introduce a frame-selection based approach for developing
speech-signal-based frequency warping scale. Then, we propose a new method for
computing the filter frequency responses by using principal component analysis
(PCA). The main advantage of the proposed method over the recently introduced
deep learning based methods is that it requires very limited amount of
unlabeled speech-data. We demonstrate that the proposed filterbank has more
speaker discriminative power than commonly used mel filterbank as well as
existing data-driven filterbank. We conduct automatic speaker verification
(ASV) experiments with different corpora using various classifier back-ends. We
show that the acoustic features created with proposed filterbank are better
than existing mel-frequency cepstral coefficients (MFCCs) and
speech-signal-based frequency cepstral coefficients (SFCCs) in most cases. In
the experiments with VoxCeleb1 and popular i-vector back-end, we observe 9.75%
relative improvement in equal error rate (EER) over MFCCs. Similarly, the
relative improvement is 4.43% with recently introduced x-vector system. We
obtain further improvement using fusion of the proposed method with standard
MFCC-based approach.Comment: Published in Digital Signal Processing journal (Elsevier
A non-linear polynomial approximation filter for robust speaker verification
Bibliography: leaves 101-109
Hierachical methods for large population speaker identification using telephone speech
This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion
Automatic speaker recognition: modelling, feature extraction and effects of clinical environment
Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced
Speaker verification on the polycost database using frequency filtered spectral energies
The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a first or second order FIR filter have proved to be competitive for speech recognition. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using mel-cepstrum were observed on the TIMIT database, especially when white noise was added. In this paper, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering is explored for speaker verification. This combination, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POL YCOST database.Peer ReviewedPostprint (published version
Speaker verification on the polycost database using frequency filtered spectral energies
The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a first or second order FIR filter have proved to be competitive for speech recognition. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using mel-cepstrum were observed on the TIMIT database, especially when white noise was added. In this paper, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering is explored for speaker verification. This combination, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POL YCOST database.Peer Reviewe