994 research outputs found
Text-dependent Forensic Voice Comparison: Likelihood Ratio Estimation with the Hidden Markov Model (HMM) and Gaussian Mixture Model – Universal Background Model (GMMUBM) Approaches
Among the more typical forensic voice comparison (FVC) approaches, the acoustic-phonetic statistical approach is suitable for text-dependent FVC, but it does not fully exploit available time-varying information of speech in its modelling. The automatic approach, on the other hand, essentially deals with text-independent cases, which means temporal information is not explicitly incorporated in the modelling. Text-dependent likelihood ratio (LR)-based FVC studies, in particular those that adopt the automatic approach, are few. This preliminary LR-based FVC study compares two statistical models, the Hidden Markov Model (HMM) and the Gaussian Mixture Model (GMM), for the calculation of forensic LRs using the same speech data. FVC experiments were carried out using different lengths of Japanese short words under a forensically realistic, but challenging condition: only two speech tokens for model training and LR estimation. Log-likelihood-ratio cost (Cllr) was used as the assessment metric. The study demonstrates that the HMM system constantly outperforms the GMM system in terms of average Cllr values. However, words longer than three mora are needed if the advantage of the HMM is to become evident. With a seven-mora word, for example, the HMM outperformed the GMM by a Cllr value of 0.073
High level speaker specific features modeling in automatic speaker recognition system
Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Text-independent bilingual speaker verification system.
Ma Bin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-102).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometrics --- p.2Chapter 1.2 --- Speaker Verification --- p.3Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4Chapter 1.4 --- Text Dependency --- p.4Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6Chapter 1.5 --- Language Dependency --- p.6Chapter 1.6 --- Normalization Techniques --- p.7Chapter 1.7 --- Objectives of the Thesis --- p.8Chapter 1.8 --- Thesis Organization --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Background Information --- p.11Chapter 2.1.1 --- Speech Signal Acquisition --- p.11Chapter 2.1.2 --- Speech Processing --- p.11Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14Chapter 2.1.5 --- Feature Parameters --- p.15Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18Chapter 2.1.5.3 --- Energy Measures --- p.20Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22Chapter 2.2 --- Common Techniques --- p.24Chapter 2.2.1 --- Template Model Matching Methods --- p.25Chapter 2.2.2 --- Statistical Model Methods --- p.26Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31Chapter 2.2.2.4 --- The Advantages of GMM --- p.32Chapter 2.2.3 --- Likelihood Scoring --- p.32Chapter 2.2.4 --- General Approach to Decision Making --- p.35Chapter 2.2.5 --- Cohort Normalization --- p.35Chapter 2.2.5.1 --- Probability Score Normalization --- p.36Chapter 2.2.5.2 --- Cohort Selection --- p.37Chapter 2.3 --- Chapter Summary --- p.38Chapter 3 --- Experimental Corpora --- p.39Chapter 3.1 --- The YOHO Corpus --- p.39Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44Chapter 3.3 --- Chapter Summary --- p.46Chapter 4 --- Text-Dependent Speaker Verification --- p.47Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48Chapter 4.2 --- Cohort Normalization Setup --- p.50Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53Chapter 4.3.1 --- Subword HMM Models --- p.53Chapter 4.3.2 --- Experimental Results --- p.55Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61Chapter 4.4.1 --- Experimental Setup --- p.61Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65Chapter 4.5 --- Comparison with Previous Systems --- p.67Chapter 4.6 --- Chapter Summary --- p.70Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71Chapter 5.1 --- Front-End Processing of the CUBS --- p.72Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73Chapter 5.3 --- Cohort Normalization --- p.74Chapter 5.4 --- Experimental Results and Analysis --- p.75Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79Chapter 5.4.3 --- Language Dependency --- p.80Chapter 5.4.4 --- Language-Independency --- p.83Chapter 5.5 --- Chapter Summary --- p.88Chapter 6 --- Conclusions and Future Work --- p.90Chapter 6.1 --- Summary --- p.90Chapter 6.1.1 --- Feature Comparison --- p.91Chapter 6.1.2 --- HMM Modeling --- p.91Chapter 6.1.3 --- GMM Modeling --- p.91Chapter 6.1.4 --- Cohort Normalization --- p.92Chapter 6.1.5 --- Language Dependency --- p.92Chapter 6.2 --- Future Work --- p.93Chapter 6.2.1 --- Feature Parameters --- p.93Chapter 6.2.2 --- Model Quality --- p.93Chapter 6.2.2.1 --- Variance Flooring --- p.93Chapter 6.2.2.2 --- Silence Detection --- p.94Chapter 6.2.3 --- Conversational Speaker Verification --- p.95Bibliography --- p.10
Biometrics
Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book
- …