1,263 research outputs found
Pronunciation modeling for ASR - knowledge-based and data-derived methods.
This article focuses on modeling pronunciation variation in two different ways: data-derived and knowledge-based. The knowledge-based approach consists of using phonological rules to generate variants. The data-derived approach consists of performing phone recognition, followed by smoothing using decision trees (D-trees) to alleviate some of the errors in the phone recognition. Using phonological rules led to a small improvement in WER; a data-derived approach in which the phone recognition was smoothed using D-trees prior to lexicon generation led to larger improvements compared to the baseline. The lexicon was employed in two different recognition systems: a hybrid HMM/ANN system and a HMM-based system, to ascertain whether pronunciation variation was truly being modeled. This proved to be the case as no significant differences were found between the results obtained with the two systems. Furthermore, we found that 10% of variants generated by the phonological rules were also found using phone recognition, and this increased to 28% when the phone recognition output was smoothed by using D-trees. This indicates that the D-trees generalize beyond what has been seen in the training material, whereas when the phone recognition approach is employed directly, unseen pronunciations cannot be predicted. In addition, we propose a metric to measure confusability in the lexicon. Using this confusion metric to prune variants results in roughly the same improvement as using the D-tree method
Automatic Phonetic Transcription of Non-Prompted Speech
A reliable method for automatic phonetic transcription of non− prompted German speech has been developed at th
Non-native children speech recognition through transfer learning
This work deals with non-native children's speech and investigates both
multi-task and transfer learning approaches to adapt a multi-language Deep
Neural Network (DNN) to speakers, specifically children, learning a foreign
language. The application scenario is characterized by young students learning
English and German and reading sentences in these second-languages, as well as
in their mother language. The paper analyzes and discusses techniques for
training effective DNN-based acoustic models starting from children native
speech and performing adaptation with limited non-native audio material. A
multi-lingual model is adopted as baseline, where a common phonetic lexicon,
defined in terms of the units of the International Phonetic Alphabet (IPA), is
shared across the three languages at hand (Italian, German and English); DNN
adaptation methods based on transfer learning are evaluated on significant
non-native evaluation sets. Results show that the resulting non-native models
allow a significant improvement with respect to a mono-lingual system adapted
to speakers of the target language
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
- …