Search CORE

3,507 research outputs found

Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields

Author: King Simon
Wang Dong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2010
Field of study

Pronunciation prediction, or letter-to-sound (LTS

CiteSeerX

Crossref

Edinburgh Research Explorer

EURECOM Repository

Nearest Neighbor-Based Indonesian G2P Conversion

Author: Harjoko Agus
Suyanto Suyanto
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2014
Field of study

Grapheme-to-phoneme conversion (G2P), also known as letter-to-sound conversion, is an important module in both speech synthesis and speech recognition. The methods of G2P give varying accuracies for different languages although they are designed to be language independent. This paper discusses a new model based on pseudo nearest neighbor rule (PNNR) for Indonesian G2P. In this model, partial orthogonal binary code for graphemes, contextual weighting, and neighborhood weighting are introduced. Testing to 9,604 unseen words shows that the model parameters are easy to be tuned to reach high accuracy. Testing to 123 sentences containing homographs shows that the model could disambiguate homographs if it uses long graphemic context. Compare to information gain tree, PNNR gives slightly higher phoneme error rate, but it could disambiguate homographs

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

The acquisition and generalisation of orthography-phonology correspondences

Author: Lawrence Rebecca
Publication venue
Publication date: 01/01/2023
Field of study

Royal Holloway - Pure

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Author: Tits Noé
Publication venue
Publication date: 17/10/2023
Field of study

In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields.Comment: Accepted for publication at EMNLP 202

arXiv.org e-Print Archive

New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

Author: Iribe Yurie
Katsurada Kouichi
Kheang Seng
Nitta Tsuneo
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 20/12/2014
Field of study

The precise conversion of arbitrary text into its corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words to phonemes, while the second-stage model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

Posterior-Based Multi-Stream Formulation To Combine Multiple Grapheme-to-Phoneme Conversion Techniques

Author: Magimai.-Doss Mathew
Razavi Marzieh
Publication venue: Idiap
Publication date: 19/11/2015
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Author: Fohr Dominique
Illina Irina
Jouvet Denis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/03/2012
Field of study

International audienceThis paper deals with the evaluation of grapheme-to-phoneme (G2P) converters in a speech recognition context. The precision and recall rates are investigated as potential measures of the quality of the multiple generated pronunciation variants. Very different results are obtained whether or not we take into account the frequency of occurrence of the words. Since G2P systems are rarely evaluated on a speech recognition performance basis, the originality of this paper consists in using a speech recognition system to evaluate the G2P pronunciation variants. The results show that the training process is quite robust to some errors in the pronunciation lexicon, whereas pronunciation lexicon errors are harmful in the decoding process. Noticeable speech recognition performance improvements are achieved by combining two different G2P converters, one based on conditional random fields and the other on joint multigram models, as well as by checking the pronunciation variants of the most frequent words

INRIA a CCSD electronic archive server