1,318 research outputs found
Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers
Speaker Verification (SV) systems involve mainly two individual stages:
feature extraction and classification. In this paper, we explore these two
modules with the aim of improving the performance of a speaker verification
system under noisy conditions. On the one hand, the choice of the most
appropriate acoustic features is a crucial factor for performing robust speaker
verification. The acoustic parameters used in the proposed system are: Mel
Frequency Cepstral Coefficients (MFCC), their first and second derivatives
(Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC),
Perceptual Linear Predictive (PLP), and Relative Spectral Transform -
Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison
of different combinations of the previous features is discussed. On the other
hand, the major weakness of a conventional Support Vector Machine (SVM)
classifier is the use of generic traditional kernel functions to compute the
distances among data points. However, the kernel function of an SVM has great
influence on its performance. In this work, we propose the combination of two
SVM-based classifiers with different kernel functions: Linear kernel and
Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR)
classifier. The combination is carried out by means of a parallel structure
approach, in which different voting rules to take the final decision are
considered. Results show that significant improvement in the performance of the
SV system is achieved by using the combined features with the combined
classifiers either with clean speech or in the presence of noise. Finally, to
enhance the system more in noisy environments, the inclusion of the multiband
noise removal technique as a preprocessing stage is proposed
Multi-Modal Biometrics: Applications, Strategies and Operations
The need for adequate attention to security of lives and properties cannot be over-emphasised. Existing approaches to security management by various agencies and sectors have focused on the use of possession (card, token) and knowledge (password, username)-based strategies which are susceptible to forgetfulness, damage, loss, theft, forgery and other activities of fraudsters. The surest and most appropriate strategy for handling these challenges is the use of naturally endowed biometrics, which are the human physiological and behavioural characteristics. This paper presents an overview of the use of biometrics for human verification and identification. The applications, methodologies, operations, integration, fusion and strategies for multi-modal biometric systems that give more secured and reliable human identity management is also presented
Sequential decision fusion for controlled detection errors
Information fusion in biometrics has received considerable attention. The architecture proposed here is based on the sequential integration of multi-instance and multi-sample fusion schemes. This method is analytically shown to improve the performance and allow a controlled trade-off between false alarms and false rejects when the classifier decisions are statistically independent. Equations developed for detection error rates are experimentally evaluated by considering the proposed architecture for text dependent speaker verification using HMM based digit dependent speaker models. The tuning of parameters, n classifiers and m attempts/samples, is investigated and the resultant detection error trade-off performance is evaluated on individual digits. Results show that performance improvement can be achieved even for weaker classifiers (FRR-19.6%, FAR-16.7%). The architectures investigated apply to speaker verification from spoken digit strings such as credit card numbers in telephone or VOIP or internet based applications
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Model kompanzasyonlu birinci derece istatistikleri ile i-vektörlerin gürbüzlüğünün artırılması
Speaker recognition systems achieved significant improvements over the last decade, especially due to
the performance of the i-vectors. Despite the achievements, mismatch between training and test data
affects the recognition performance considerably. In this paper, a solution is offered to increase
robustness against additive noises by inserting model compensation techniques within the i-vector
extraction scheme. For stationary noises, the model compensation techniques produce highly robust
systems. Parallel Model Compensation and Vector Taylor Series are considered as state-of-the-art
model compensation techniques. Applying these methods to the first order statistics, a noisy total
variability space training is aimed, which will reduce the mismatch resulted by additive noises. All other
parts of the conventional i-vector scheme remain unchanged, such as total variability matrix training,
reducing the i-vector dimensionality, scoring the i-vectors. The proposed method was tested with four
different noise types with several signal to noise ratios (SNR) from -6 dB to 18 dB with 6 dB steps. High
reductions in equal error rates were achieved with both methods, even at the lowest SNR levels. On
average, the proposed approach produced more than 50% relative reduction in equal error rate.Konuşmacı tanıma sistemleri özellikle i-vektörlerin performansı sebebiyle son on yılda önemli
gelişmeler elde etmiştir. Bu gelişmelere rağmen eğitim ve test verileri arasındaki uyumsuzluk tanıma
performansını önemli ölçüde etkilemektedir. Bu çalışmada, model kompanzasyon yöntemleri i-vektör
çıkarımı şemasına eklenerek toplanabilir gürültülere karşı gürbüzlüğü artıracak bir çözüm
sunulmaktadır. Durağan gürültüler için model kompanzasyon teknikleri oldukça gürbüz sistemler üretir.
Paralel Model Kompanzasyonu ve Vektör Taylor Serileri en gelişmiş model kompanzasyon
tekniklerinden kabul edilmektedir. Bu metotlar birinci dereceden istatistiklere uygulanarak toplanabilir
gürültülerden kaynaklanan uyumsuzluğu azaltacak gürültülü tüm değişkenlik uzayı eğitimi
amaçlanmıştır. Tüm değişkenlik matrisin eğitimi, i-vektör boyutunun azaltılması, i-vektörlerin
puanlanması gibi geleneksel i-vektör şemasının diğer tüm parçaları değişmeden kalmaktadır. Önerilen
yöntem, 6 dB’lik adımlarla -6 dB’den 18 dB’ye kadar çeşitli sinyal-gürültü oranlarına (SNR) sahip dört
farklı gürültü tipi ile test edilmiştir. Her iki yöntemle de en düşük SNR seviyelerinde bile eşit hata
oranlarında yüksek azalmalar elde edilmiştir. Önerilen yaklaşım eşik hata oranında ortalama olarak
%50’den fazla göreceli azalma sağlamıştır
Using Gaussian Mixture Model and Partial Least Squares regression classifiers for robust speaker verification with various enhancement methods
In the presence of environmental noise, speaker verification systems inevitably see a decrease in performance. This thesis proposes the use of two parallel classifiers with several enhancement methods in order to improve the performance of the speaker verification system when noisy speech signals are used for authentication. Both classifiers are shown to receive statistically significant performance gains when signal-to-noise ratio estimation, affine transforms, and score-level fusion of features are all applied. These enhancement methods are validated in a large range of test conditions, from perfectly clean speech all the way down to speech where the noise is equally as loud as the speaker. After each classifier has been tuned to their best configuration, they are also fused together in different ways. In the end, the performances of the two classifiers are compared to each other and to the performances of their fusions. The fusion method where the scores of the classifiers are added together is found to be the best method
- …