288 research outputs found

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Text-independent bilingual speaker verification system.

    Get PDF
    Ma Bin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 96-102).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometrics --- p.2Chapter 1.2 --- Speaker Verification --- p.3Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4Chapter 1.4 --- Text Dependency --- p.4Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6Chapter 1.5 --- Language Dependency --- p.6Chapter 1.6 --- Normalization Techniques --- p.7Chapter 1.7 --- Objectives of the Thesis --- p.8Chapter 1.8 --- Thesis Organization --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Background Information --- p.11Chapter 2.1.1 --- Speech Signal Acquisition --- p.11Chapter 2.1.2 --- Speech Processing --- p.11Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14Chapter 2.1.5 --- Feature Parameters --- p.15Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18Chapter 2.1.5.3 --- Energy Measures --- p.20Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22Chapter 2.2 --- Common Techniques --- p.24Chapter 2.2.1 --- Template Model Matching Methods --- p.25Chapter 2.2.2 --- Statistical Model Methods --- p.26Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31Chapter 2.2.2.4 --- The Advantages of GMM --- p.32Chapter 2.2.3 --- Likelihood Scoring --- p.32Chapter 2.2.4 --- General Approach to Decision Making --- p.35Chapter 2.2.5 --- Cohort Normalization --- p.35Chapter 2.2.5.1 --- Probability Score Normalization --- p.36Chapter 2.2.5.2 --- Cohort Selection --- p.37Chapter 2.3 --- Chapter Summary --- p.38Chapter 3 --- Experimental Corpora --- p.39Chapter 3.1 --- The YOHO Corpus --- p.39Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44Chapter 3.3 --- Chapter Summary --- p.46Chapter 4 --- Text-Dependent Speaker Verification --- p.47Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48Chapter 4.2 --- Cohort Normalization Setup --- p.50Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53Chapter 4.3.1 --- Subword HMM Models --- p.53Chapter 4.3.2 --- Experimental Results --- p.55Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61Chapter 4.4.1 --- Experimental Setup --- p.61Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65Chapter 4.5 --- Comparison with Previous Systems --- p.67Chapter 4.6 --- Chapter Summary --- p.70Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71Chapter 5.1 --- Front-End Processing of the CUBS --- p.72Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73Chapter 5.3 --- Cohort Normalization --- p.74Chapter 5.4 --- Experimental Results and Analysis --- p.75Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79Chapter 5.4.3 --- Language Dependency --- p.80Chapter 5.4.4 --- Language-Independency --- p.83Chapter 5.5 --- Chapter Summary --- p.88Chapter 6 --- Conclusions and Future Work --- p.90Chapter 6.1 --- Summary --- p.90Chapter 6.1.1 --- Feature Comparison --- p.91Chapter 6.1.2 --- HMM Modeling --- p.91Chapter 6.1.3 --- GMM Modeling --- p.91Chapter 6.1.4 --- Cohort Normalization --- p.92Chapter 6.1.5 --- Language Dependency --- p.92Chapter 6.2 --- Future Work --- p.93Chapter 6.2.1 --- Feature Parameters --- p.93Chapter 6.2.2 --- Model Quality --- p.93Chapter 6.2.2.1 --- Variance Flooring --- p.93Chapter 6.2.2.2 --- Silence Detection --- p.94Chapter 6.2.3 --- Conversational Speaker Verification --- p.95Bibliography --- p.10

    Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

    Get PDF
    Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

    Subband spectral features for speaker recognition.

    Get PDF
    Tam Yuk Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references.Abstracts in English and Chinese.Chapter Chapter 1 --- Introduction --- p.1Chapter 1.1. --- Biometrics for User Authentication --- p.2Chapter 1.2. --- Voice-based User Authentication --- p.6Chapter 1.3. --- Motivation and Focus of This Work --- p.7Chapter 1.4. --- Thesis Outline --- p.9References --- p.11Chapter Chapter 2 --- Fundamentals of Automatic Speaker Recognition --- p.14Chapter 2.1. --- Speech Production --- p.14Chapter 2.2. --- Features of Speaker's Voice in Speech Signal --- p.16Chapter 2.3. --- Basics of Speaker Recognition --- p.19Chapter 2.4. --- Existing Approaches of Speaker Recognition --- p.20Chapter 2.4.1. --- Feature Extraction --- p.21Chapter 2.4.1.1 --- Overview --- p.21Chapter 2.4.1.2 --- Mel-Frequency Cepstral Coefficient (MFCC) --- p.21Chapter 2.4.2. --- Speaker Modeling --- p.24Chapter 2.4.2.1 --- Overview --- p.24Chapter 2.4.2.2 --- Gaussian Mixture Model (GMM) --- p.25Chapter 2.4.3. --- Speaker Identification (SID) --- p.26References --- p.29Chapter Chapter 3 --- Data Collection and Baseline System --- p.32Chapter 3.1. --- Data Collection --- p.32Chapter 3.2. --- Baseline System --- p.36Chapter 3.2.1. --- Experimental Set-up --- p.36Chapter 3.2.2. --- Results and Analysis --- p.39References --- p.42Chapter Chapter 4 --- Subband Spectral Envelope Features --- p.44Chapter 4.1. --- Spectral Envelope Features --- p.44Chapter 4.2. --- Subband Spectral Envelope Features --- p.46Chapter 4.3. --- Feature Extraction Procedures --- p.52Chapter 4.4. --- SID Experiments --- p.55Chapter 4.4.1. --- Experimental Set-up --- p.55Chapter 4.4.2. --- Results and Analysis --- p.55References --- p.62Chapter Chapter 5 --- Fusion of Subband Features --- p.63Chapter 5.1. --- Model Level Fusion --- p.63Chapter 5.1.1. --- Experimental Set-up --- p.63Chapter 5.1.2. --- "Results and Analysis," --- p.65Chapter 5.2. --- Feature Level Fusion --- p.69Chapter 5.2.1. --- Experimental Set-up --- p.70Chapter 5.2.2. --- "Results and Analysis," --- p.71Chapter 5.3. --- Discussion --- p.73References --- p.75Chapter Chapter 6 --- Utterance-Level SID with Text-Dependent Weights --- p.77Chapter 6.1. --- Motivation --- p.77Chapter 6.2. --- Utterance-Level SID --- p.78Chapter 6.3. --- Baseline System --- p.79Chapter 6.3.1. --- Implementation Details --- p.79Chapter 6.3.2. --- "Results and Analysis," --- p.80Chapter 6.4. --- Text-Dependent Weights --- p.81Chapter 6.4.1. --- Implementation Details --- p.81Chapter 6.4.2. --- "Results and Analysis," --- p.83Chapter 6.5. --- Text-Dependent Feature Weights --- p.86Chapter 6.5.1. --- Implementation Details --- p.86Chapter 6.5.2. --- "Results and Analysis," --- p.87Chapter 6.6. --- Text-Dependent Weights Applied in Score Combination and Subband Features --- p.88Chapter 6.6.1. --- Implementation Details --- p.89Chapter 6.6.2. --- Results and Analysis --- p.89Chapter 6.7. --- Discussion --- p.90Chapter Chapter 7 --- Conclusions and Suggested Future Work --- p.92Chapter 7.1. --- Conclusions --- p.92Chapter 7.2. --- Suggested Future Work --- p.94Appendix --- p.96Appendix 1 Speech Content for Data Collection --- p.9

    Open-set Speaker Identification

    Get PDF
    This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition. The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material

    Hierachical methods for large population speaker identification using telephone speech

    Get PDF
    This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

    Using duration information in HMM-based automatic speech recognition.

    Get PDF
    Zhu Yu.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 100-104).Abstracts in English and Chinese.Chapter CHAPTER 1 --- lNTRODUCTION --- p.1Chapter 1.1. --- Speech and its temporal structure --- p.1Chapter 1.2. --- Previous work on the modeling of temporal structure --- p.1Chapter 1.3. --- Integrating explicit duration modeling in HMM-based ASR system --- p.3Chapter 1.4. --- Thesis outline --- p.3Chapter CHAPTER 2 --- BACKGROUND --- p.5Chapter 2.1. --- Automatic speech recognition process --- p.5Chapter 2.2. --- HMM for ASR --- p.6Chapter 2.2.1. --- HMM for ASR --- p.6Chapter 2.2.2. --- HMM-based ASR system --- p.7Chapter 2.3. --- General approaches to explicit duration modeling --- p.12Chapter 2.3.1. --- Explicit duration modeling --- p.13Chapter 2.3.2. --- Training of duration model --- p.16Chapter 2.3.3. --- Incorporation of duration model in decoding --- p.18Chapter CHAPTER 3 --- CANTONESE CONNECTD-DlGlT RECOGNITION --- p.21Chapter 3.1. --- Cantonese connected digit recognition --- p.21Chapter 3.1.1. --- Phonetics of Cantonese and Cantonese digit --- p.21Chapter 3.2. --- The baseline system --- p.24Chapter 3.2.1. --- Speech corpus --- p.24Chapter 3.2.2. --- Feature extraction --- p.25Chapter 3.2.3. --- HMM models --- p.26Chapter 3.2.4. --- HMM decoding --- p.27Chapter 3.3. --- Baseline performance and error analysis --- p.27Chapter 3.3.1. --- Recognition performance --- p.27Chapter 3.3.2. --- Performance for different speaking rates --- p.28Chapter 3.3.3. --- Confusion matrix --- p.30Chapter CHAPTER 4 --- DURATION MODELING FOR CANTONESE DIGITS --- p.41Chapter 4.1. --- Duration features --- p.41Chapter 4.1.1. --- Absolute duration feature --- p.41Chapter 4.1.2. --- Relative duration feature --- p.44Chapter 4.2. --- Parametric distribution for duration modeling --- p.47Chapter 4.3. --- Estimation of the model parameters --- p.51Chapter 4.4. --- Speaking-rate-dependent duration model --- p.52Chapter CHAPTER 5 --- USING DURATION MODELING FOR CANTONSE DIGIT RECOGNITION --- p.57Chapter 5.1. --- Baseline decoder --- p.57Chapter 5.2. --- Incorporation of state-level duration model --- p.59Chapter 5.3. --- Incorporation word-level duration model --- p.62Chapter 5.4. --- Weighted use of duration model --- p.65Chapter CHAPTER 6 --- EXPERIMENT RESULT AND ANALYSIS --- p.66Chapter 6.1. --- Experiments with speaking-rate-independent duration models --- p.66Chapter 6.1.1. --- Discussion --- p.68Chapter 6.1.2. --- Analysis of the error patterns --- p.71Chapter 6.1.3. --- "Reduction of deletion, substitution and insertion" --- p.72Chapter 6.1.4. --- Recognition performance at different speaking rates --- p.75Chapter 6.2. --- Experiments with speaking-rate-dependent duration models --- p.77Chapter 6.2.1. --- Using true speaking rate --- p.77Chapter 6.2.2. --- Using estimated speaking rate --- p.79Chapter 6.3. --- Evaluation on another speech database --- p.80Chapter 6.3.1. --- Experimental setup --- p.80Chapter 6.3.2. --- Experiment results and analysis --- p.82Chapter CHAPTER 7 --- CONCLUSIONS AND FUTUR WORK --- p.87Chapter 7.1. --- Conclusion and understanding of current work --- p.87Chapter 7.2. --- Future work --- p.89Chapter A --- APPENDIX --- p.90BIBLIOGRAPHY --- p.10

    Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application

    Get PDF
    Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed

    Biometrics

    Get PDF
    Biometrics uses methods for unique recognition of humans based upon one or more intrinsic physical or behavioral traits. In computer science, particularly, biometrics is used as a form of identity access management and access control. It is also used to identify individuals in groups that are under surveillance. The book consists of 13 chapters, each focusing on a certain aspect of the problem. The book chapters are divided into three sections: physical biometrics, behavioral biometrics and medical biometrics. The key objective of the book is to provide comprehensive reference and text on human authentication and people identity verification from both physiological, behavioural and other points of view. It aims to publish new insights into current innovations in computer systems and technology for biometrics development and its applications. The book was reviewed by the editor Dr. Jucheng Yang, and many of the guest editors, such as Dr. Girija Chetty, Dr. Norman Poh, Dr. Loris Nanni, Dr. Jianjiang Feng, Dr. Dongsun Park, Dr. Sook Yoon and so on, who also made a significant contribution to the book
    corecore