29 research outputs found

    Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model

    Get PDF
    In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Temporally Varying Weight Regression for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Unsupervised model adaptation for continuous speech recognition using model-level confidence measures.

    Get PDF
    Kwan Ka Yan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references.Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1. --- Automatic Speech Recognition --- p.1Chapter 1.2. --- Robustness of ASR Systems --- p.3Chapter 1.3. --- Model Adaptation for Robust ASR --- p.4Chapter 1.4. --- Thesis outline --- p.6References --- p.8Chapter 2. --- Fundamentals of Continuous Speech Recognition --- p.10Chapter 2.1. --- Acoustic Front-End --- p.10Chapter 2.2. --- Recognition Module --- p.11Chapter 2.2.1. --- Acoustic Modeling with HMM --- p.12Chapter 2.2.2. --- Basic Phonology of Cantonese --- p.14Chapter 2.2.3. --- Acoustic Modeling for Cantonese --- p.15Chapter 2.2.4. --- Language Modeling --- p.16References --- p.17Chapter 3. --- Unsupervised Model Adaptation --- p.18Chapter 3.1. --- A General Review of Model Adaptation --- p.18Chapter 3.1.1. --- Supervised and Unsupervised Adaptation --- p.20Chapter 3.1.2. --- N-Best Adaptation --- p.22Chapter 3.2. --- MAP --- p.23Chapter 3.3. --- MLLR --- p.25Chapter 3.3.1. --- Adaptation Approach --- p.26Chapter 3.3.2. --- Estimation of MLLR regression matrices --- p.27Chapter 3.3.3. --- Least Mean Squares Regression --- p.29Chapter 3.3.4. --- Number of Transformations --- p.30Chapter 3.4. --- Experiment Results --- p.32Chapter 3.4.1. --- Standard MLLR versus LMS MLLR --- p.36Chapter 3.4.2. --- Effect of the Number of Transformations --- p.43Chapter 3.4.3. --- MAP Vs. MLLR --- p.46Chapter 3.5. --- Conclusions --- p.48ReferencesxlixChapter 4. --- Use of Confidence Measure for MLLR based Adaptation --- p.50Chapter 4.1. --- Introduction to Confidence Measure --- p.50Chapter 4.2. --- Confidence Measure Based on Word Density --- p.51Chapter 4.3. --- Model-level confidence measure --- p.53Chapter 4.4. --- Integrating Confusion Information into Confidence Measure --- p.55Chapter 4.5. --- Adaptation Data Distributions in Different Confidence Measures..… --- p.57References --- p.65Chapter 5. --- Experimental Results and Analysis --- p.66Chapter 5.1. --- Supervised Adaptation --- p.67Chapter 5.2. --- Cheated Confidence Measure --- p.69Chapter 5.3. --- Confidence Measures of Different Levels --- p.71Chapter 5.4. --- Incorporation of Confusion Matrix --- p.81Chapter 5.5. --- Conclusions --- p.83Chapter 6. --- Conclusions --- p.35Chapter 6.1. --- Future Works --- p.8

    Robust speech recognition under band-limited channels and other channel distortions

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, junio de 200

    Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

    Get PDF
    Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, we focus on effective building of ASR systems in the absence of linguistic expertise for a new domain or language. Particularly, we consider graphemes as alternate subword units for speech recognition. In a grapheme lexicon, pronunciation of a word is derived from its orthography. However, modeling graphemes for speech recognition is a challenging task for two reasons. Firstly, grapheme-to-phoneme (G2P) relationship can be ambiguous as languages continue to evolve after their spelling has been standardized. Secondly, as elucidated in this thesis, typically ASR systems directly model the relationship between graphemes and acoustic features; and the acoustic features depict the envelope of speech, which is related to phones. In this thesis, a grapheme-based ASR approach is proposed where the modeling of the relationship between graphemes and acoustic features is factored through a latent variable into two models, namely, acoustic model and lexical model. In the acoustic model the relationship between latent variables and acoustic features is modeled, while in the lexical model a probabilistic relationship between latent variables and graphemes is modeled. We refer to the proposed approach as probabilistic lexical modeling based ASR. In the thesis we show that the latent variables can be phones or multilingual phones or clustered context-dependent subword units; and an acoustic model can be trained on domain-independent or language-independent resources. The lexical model is trained on transcribed speech data from the target domain or language. In doing so, the parameters of the lexical model capture a probabilistic relationship between graphemes and phones. In the proposed grapheme-based ASR approach, lexicon learning is implicitly integrated as a phase in ASR system training as opposed to the conventional approach where first phone pronunciation lexicon is developed and then a phone-based ASR system is trained. The potential and the efficacy of the proposed approach is demonstrated through experiments and comparisons with other standard approaches on ASR for resource rich languages, nonnative and accented speech, under-resourced languages, and minority languages. The studies revealed that the proposed framework is particularly suitable when the task is challenged by the lack of both linguistic expertise and transcribed data. Furthermore, our investigations also showed that standard ASR approaches in which the lexical model is deterministic are more suitable for phones than graphemes, while probabilistic lexical model based ASR approach is suitable for both. Finally, we show that the captured grapheme-to-phoneme relationship can be exploited to perform acoustic data-driven G2P conversion

    Acoustic Approaches to Gender and Accent Identification

    Get PDF
    There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition
    corecore