Search CORE

325 research outputs found

Acoustic Modelling for Under-Resourced Languages

Author: Stüker Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

KITopen

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Flexible decision trees for grapheme based speech recognition

Author: Mimer Borislava
Schultz Tanja
Stüker Sebastian
Publication venue: Cottbus
Publication date: 01/01/2004
Field of study

KITopen

Towards Rapid Language Portability of Speech Processing Systems

Author: Schultz Tanja
Publication venue
Publication date: 16/06/2008
Field of study

KITopen

Advances in the application of support vector machines as probabilistic estimators for continuous automatic speech recognition

Author: Bolaños Alonso Daniel
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, noviembre de 200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Pronunciation modeling for Cantonese speech recognition.

Author
Publication venue
Publication date: 01/01/2003
Field of study

Kam Patgi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaf 103).Abstracts in English and Chinese.Chapter Chapter 1. --- Introduction --- p.1Chapter 1.1 --- Automatic Speech Recognition --- p.1Chapter 1.2 --- Pronunciation Modeling in ASR --- p.2Chapter 1.3 --- Obj ectives of the Thesis --- p.5Chapter 1.4 --- Thesis Outline --- p.5Reference --- p.7Chapter Chapter 2. --- The Cantonese Dialect --- p.9Chapter 2.1 --- Cantonese - A Typical Chinese Dialect --- p.10Chapter 2.1.1 --- Cantonese Phonology --- p.11Chapter 2.1.2 --- Cantonese Phonetics --- p.12Chapter 2.2 --- Pronunciation Variation in Cantonese --- p.13Chapter 2.2.1 --- Phone Change and Sound Change --- p.14Chapter 2.2.2 --- Notation for Different Sound Units --- p.16Chapter 2.3 --- Summary --- p.17Reference --- p.18Chapter Chapter 3. --- Large-Vocabulary Continuous Speech Recognition for Cantonese --- p.19Chapter 3.1 --- Feature Representation of the Speech Signal --- p.20Chapter 3.2 --- Probabilistic Framework of ASR --- p.20Chapter 3.3 --- Hidden Markov Model for Acoustic Modeling --- p.21Chapter 3.4 --- Pronunciation Lexicon --- p.25Chapter 3.5 --- Statistical Language Model --- p.25Chapter 3.6 --- Decoding --- p.26Chapter 3.7 --- The Baseline Cantonese LVCSR System --- p.26Chapter 3.7.1 --- System Architecture --- p.26Chapter 3.7.2 --- Speech Databases --- p.28Chapter 3.8 --- Summary --- p.29Reference --- p.30Chapter Chapter 4. --- Pronunciation Model --- p.32Chapter 4.1 --- Pronunciation Modeling at Different Levels --- p.33Chapter 4.2 --- Phone-level pronunciation model and its Application --- p.35Chapter 4.2.1 --- IF Confusion Matrix (CM) --- p.35Chapter 4.2.2 --- Decision Tree Pronunciation Model (DTPM) --- p.38Chapter 4.2.3 --- Refinement of Confusion Matrix --- p.41Chapter 4.3 --- Summary --- p.43References --- p.44Chapter Chapter 5. --- Pronunciation Modeling at Lexical Level --- p.45Chapter 5.1 --- Construction of PVD --- p.46Chapter 5.2 --- PVD Pruning by Word Unigram --- p.48Chapter 5.3 --- Recognition Experiments --- p.49Chapter 5.3.1 --- Experiment 1 ´ؤPronunciation Modeling in LVCSR --- p.49Chapter 5.3.2 --- Experiment 2 ´ؤ Pronunciation Modeling in Domain Specific task --- p.58Chapter 5.3.3 --- Experiment 3 ´ؤ PVD Pruning by Word Unigram --- p.62Chapter 5.4 --- Summary --- p.63Reference --- p.64Chapter Chapter 6. --- Pronunciation Modeling at Acoustic Model Level --- p.66Chapter 6.1 --- Hierarchy of HMM --- p.67Chapter 6.2 --- Sharing of Mixture Components --- p.68Chapter 6.3 --- Adaptation of Mixture Components --- p.70Chapter 6.4 --- Combination of Mixture Component Sharing and Adaptation --- p.74Chapter 6.5 --- Recognition Experiments --- p.78Chapter 6.6 --- Result Analysis --- p.80Chapter 6.6.1 --- Performance of Sharing Mixture Components --- p.81Chapter 6.6.2 --- Performance of Mixture Component Adaptation --- p.84Chapter 6.7 --- Summary --- p.85Reference --- p.87Chapter Chapter 7. --- Pronunciation Modeling at Decoding Level --- p.88Chapter 7.1 --- Search Process in Cantonese LVCSR --- p.88Chapter 7.2 --- Model-Level Search Space Expansion --- p.90Chapter 7.3 --- State-Level Output Probability Modification --- p.92Chapter 7.4 --- Recognition Experiments --- p.93Chapter 7.4.1 --- Experiment 1 ´ؤModel-Level Search Space Expansion --- p.93Chapter 7.4.2 --- Experiment 2 ´ؤ State-Level Output Probability Modification …… --- p.94Chapter 7.5 --- Summary --- p.96Reference --- p.97Chapter Chapter 8. --- Conclusions and Suggestions for Future Work --- p.98Chapter 8.1 --- Conclusions --- p.98Chapter 8.2 --- Suggestions for Future Work --- p.100Reference --- p.103Appendix I Base Syllable Table --- p.104Appendix II Cantonese Initials and Finals --- p.105Appendix III IF confusion matrix --- p.106Appendix IV Phonetic Question Set --- p.112Appendix V CDDT and PCDT --- p.11

CUHK Digital Repository

Towards Universal Speech Recognition

Author: Schultz Tanja
Topkara Umut
Waibel Alex
Wang Zhirong
Publication venue
Publication date: 12/06/2008
Field of study

KITopen

ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

Author: Mitra Vikramjit
Publication venue
Publication date: 01/01/2010
Field of study

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

CiteSeerX

Digital Repository at the University of Maryland