3,507 research outputs found
Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields
Pronunciation prediction, or letter-to-sound (LTS
Nearest Neighbor-Based Indonesian G2P Conversion
Grapheme-to-phoneme conversion (G2P), also known as letter-to-sound conversion, is an important module in both speech synthesis and speech recognition. The methods of G2P give varying accuracies for different languages although they are designed to be language independent. This paper discusses a new model based on pseudo nearest neighbor rule (PNNR) for Indonesian G2P. In this model, partial orthogonal binary code for graphemes, contextual weighting, and neighborhood weighting are introduced. Testing to 9,604 unseen words shows that the model parameters are easy to be tuned to reach high accuracy. Testing to 123 sentences containing homographs shows that the model could disambiguate homographs if it uses long graphemic context. Compare to information gain tree, PNNR gives slightly higher phoneme error rate, but it could disambiguate homographs
MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning
In this paper, we present a methodology for linguistic feature extraction,
focusing particularly on automatically syllabifying words in multiple
languages, with a design to be compatible with a forced-alignment tool, the
Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our
method focuses on the extraction of phonetic transcriptions from text, stress
marks, and a unified automatic syllabification (in text and phonetic domains).
The system was built with open-source components and resources. Through an
ablation study, we demonstrate the efficacy of our approach in automatically
syllabifying words from several languages (English, French and Spanish).
Additionally, we apply the technique to the transcriptions of the CMU ARCTIC
dataset, generating valuable annotations available
online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for
speech representation learning, speech unit discovery, and disentanglement of
speech factors in several speech-related fields.Comment: Accepted for publication at EMNLP 202
New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion
The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset
Evaluating grapheme-to-phoneme converters in automatic speech recognition context
International audienceThis paper deals with the evaluation of grapheme-to-phoneme (G2P) converters in a speech recognition context. The precision and recall rates are investigated as potential measures of the quality of the multiple generated pronunciation variants. Very different results are obtained whether or not we take into account the frequency of occurrence of the words. Since G2P systems are rarely evaluated on a speech recognition performance basis, the originality of this paper consists in using a speech recognition system to evaluate the G2P pronunciation variants. The results show that the training process is quite robust to some errors in the pronunciation lexicon, whereas pronunciation lexicon errors are harmful in the decoding process. Noticeable speech recognition performance improvements are achieved by combining two different G2P converters, one based on conditional random fields and the other on joint multigram models, as well as by checking the pronunciation variants of the most frequent words
- …