Search CORE

103 research outputs found

On Joint Modelling of Grapheme and Phoneme Information using KL-HMM for ASR

Author: Aradilla Guillermo
Bourlard Hervé
Magimai.-Doss Mathew
Publication venue: Idiap
Publication date: 11/02/2010
Field of study

In this paper, we propose a simple approach to jointly model both grapheme and phoneme information using Kullback-Leibler divergence based HMM (KL-HMM) system. More specifically, graphemes are used as subword units and phoneme posterior probabilities estimated at output of multilayer perceptron are used as observation feature vector. Through preliminary studies on DARPA Resource Management corpus it is shown that although the proposed approach yield lower performance compared to KL-HMM system using phoneme as subword units, this gap in the performance can be bridged via temporal modelling at the observation feature vector level and contextual modelling of early tagged contextual graphemes

Infoscience - École polytechnique fédérale de Lausanne

Rule based learning of word pronunciations from training corpora

Author: Molnár Lajos, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (M.Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaves 83-85).This paper describes a text-to-pronunciation system using transformation-based error-driven learning for speech-recognition purposes. Efforts have been made to make the system language independent, automatic, robust and able to generate multiple pronunciations. The learner proposes initial pronunciations for the words and finds transformations that bring the pronunciations closer to the correct pronunciations. The pronunciation generator works by applying the transformations to a similar initial pronunciation. A dynamic aligner is used for the necessary alignment of phonemes and graphemes. The pronunciations are scored using a weighed string edit distance. Optimizations were made to make the learner and the rule applier fast. The system achieves 73.9% exact word accuracy with multiple pronunciations, 82.3% word accuracy with one correct pronunciation, and 95.3% phoneme accuracy for English words. For proper names, it achieves 50.5% exact word accuracy, 69.2% word accuracy, and 92.0% phoneme accuracy, which outperforms the compared neural network approach.Lajos Molnár.M.Eng.and S.B

DSpace@MIT

Data preparation and improvement of NLP software modules for parametric speech synthesis

Author
Publication venue
Publication date
Field of study

Padua Thesis and Dissertation Archive

A grapheme-based method for automatic alignment of speech and text data

Author: Bell P.
King S.
Stan A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This paper introduces a method for automatic alignment of speech data with unsynchronised, imperfect transcripts, for a domain where no initial acoustic models are available. Using grapheme-based acoustic models, word skip networks and orthographic speech transcripts, we are able to harvest 55% of the speech with a 93 % utterance-level accuracy and 99% word accuracy for the produced transcriptions. The work is based on the assumption that there is a high degree of correspondence between the speech and text, and that a full transcription of all of the speech is not required. The method is language independent and the only prior knowledge and resources required are the speech and text transcripts, and a few minor user interventions. Index Terms — speech alignment, imperfect transcripts, grapheme-based models, word networks 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Adapting Prosody in a Text-to-Speech System

Author: Caglayan Erdem
Janez Stergar
Publication venue: 'IntechOpen'
Publication date: 02/11/2010
Field of study

IntechOpen

Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

Author: Schlippe Tim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

KITopen

Proceedings of the EACL 2009 Workshop on Language Technologies for African Languages

Author: De Pauw Guy
de Schryver Gilles-Maurice
Levin Lori
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Ghent University Academic Bibliography

Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling

Author: Rasipuram Ramya
Publication venue: Lausanne, EPFL
Publication date: 29/09/2014
Field of study

Automatic speech recognition (ASR) systems incorporate expert knowledge of language or the linguistic expertise through the use of phone pronunciation lexicon (or dictionary) where each word is associated with a sequence of phones. The creation of phone pronunciation lexicon for a new language or domain is costly as it requires linguistic expertise, and includes time and money. In this thesis, we focus on effective building of ASR systems in the absence of linguistic expertise for a new domain or language. Particularly, we consider graphemes as alternate subword units for speech recognition. In a grapheme lexicon, pronunciation of a word is derived from its orthography. However, modeling graphemes for speech recognition is a challenging task for two reasons. Firstly, grapheme-to-phoneme (G2P) relationship can be ambiguous as languages continue to evolve after their spelling has been standardized. Secondly, as elucidated in this thesis, typically ASR systems directly model the relationship between graphemes and acoustic features; and the acoustic features depict the envelope of speech, which is related to phones. In this thesis, a grapheme-based ASR approach is proposed where the modeling of the relationship between graphemes and acoustic features is factored through a latent variable into two models, namely, acoustic model and lexical model. In the acoustic model the relationship between latent variables and acoustic features is modeled, while in the lexical model a probabilistic relationship between latent variables and graphemes is modeled. We refer to the proposed approach as probabilistic lexical modeling based ASR. In the thesis we show that the latent variables can be phones or multilingual phones or clustered context-dependent subword units; and an acoustic model can be trained on domain-independent or language-independent resources. The lexical model is trained on transcribed speech data from the target domain or language. In doing so, the parameters of the lexical model capture a probabilistic relationship between graphemes and phones. In the proposed grapheme-based ASR approach, lexicon learning is implicitly integrated as a phase in ASR system training as opposed to the conventional approach where first phone pronunciation lexicon is developed and then a phone-based ASR system is trained. The potential and the efficacy of the proposed approach is demonstrated through experiments and comparisons with other standard approaches on ASR for resource rich languages, nonnative and accented speech, under-resourced languages, and minority languages. The studies revealed that the proposed framework is particularly suitable when the task is challenged by the lack of both linguistic expertise and transcribed data. Furthermore, our investigations also showed that standard ASR approaches in which the lexical model is deterministic are more suitable for phones than graphemes, while probabilistic lexical model based ASR approach is suitable for both. Finally, we show that the captured grapheme-to-phoneme relationship can be exploited to perform acoustic data-driven G2P conversion

Infoscience - École polytechnique fédérale de Lausanne

The Social and Cultural Contexts of Historic Writing Practices

Author
Publication venue: Oxbow Books
Publication date: 27/01/2022
Field of study

Writing is not just a set of systems for transcribing language and communicating meaning, but an important element of human practice, deeply embedded in the cultures where it is present and fundamentally interconnected with all other aspects of human life. The Social and Cultural Contexts of Historic Writing Practices explores these relationships in a number of different cultural contexts and from a range of disciplinary perspectives, including archaeological, anthropological and linguistic. It offers new ways of approaching the study of writing and integrating it into wider debates and discussions about culture, history and archaeology

Directory of Open Access Books (DOAB)

Spelling English Words: Contributions of Phonological, Morphological and Orthographic Knowledge in Speakers of English and Chinese

Author: Zhao Jing
Publication venue
Publication date
Field of study

A growing body of literature has provided evidence of the contribution of various metalinguistic skills to children's English literacy development; however, most of the studies focused on reading outcomes while spelling outcomes have been under-researched. Further, very few studies have been conducted to investigate if the results based on native English speakers can be generalized to speakers of other languages who are learning to read and spell in English. In this study, the simultaneous influence of phonological, morphological and orthographic knowledge that may impact English spelling acquisition, among Chinese students learning English as a foreign language in Grade 8 (n = 339) in mainland China and native English-speaking students in Grade 3 (n = 166) in the United States, was investigated. Measures in English tapping into the three aspects of metalinguistic skills—phonological awareness (PA), morphological awareness (MA) and orthographic awareness (OA)—were administered to both groups. Multi-group structural equation models were used to compare models between the Chinese and the American group. Results showed that 1) the overall model of metalinguistic skills predicting spelling outcome was highly similar between the American and the Chinese groups; 2) metalinguistic skills were correlated and worked in concert to compose the linguistic repertoire construct which concurrently predicted the spelling outcome; 3) MA was the major component, compared to PA and OA, of Linguistic Repertoire (LING) across the two groups. Linguistic repertoire explained 64.1 percent and 40.2 percent of the total variance in the spelling outcome for the American and the Chinese groups, respectively; and 4) the contribution of OA was greater in the Chinese group than it was in the American group, whereas the contribution of PA was greater in the American group than it was in the Chinese group. This study highlights the important contribution of MA to literacy development among both the American students and the Chinese students. It also sheds light on the influence of first language (L1) orthography on English literacy acquisition. That OA contributed more than PA to the LING construct may reflect that the English learners with L1-Chinese background have enhanced visual-orthographic processing skills. This study challenges phase models of literacy development that claim MA only contributes to literacy acquisition late in the process and offers some empirical evidence to support the emerging "linguistic repertoire" theory of literacy development

Texas A&M Repository