681 research outputs found

    Text Preprocessing for Speech Synthesis

    Get PDF
    In this paper we describe our text preprocessing modules for English text-to-speech synthesis. These modules comprise rule-based text normalization subsuming sentence segmentation and normalization of non-standard words, statistical part-of-speech tagging, and statistical syllabification, grapheme-to-phoneme conversion, and word stress assignment relying in parts on rule-based morphological analysis

    Nearest Neighbor-Based Indonesian G2P Conversion

    Get PDF
    Grapheme-to-phoneme conversion (G2P), also known as letter-to-sound conversion, is an important module in both speech synthesis and speech recognition. The methods of G2P give varying accuracies for different languages although they are designed to be language independent. This paper discusses a new model based on pseudo nearest neighbor rule (PNNR) for Indonesian G2P. In this model, partial orthogonal binary code for graphemes, contextual weighting, and neighborhood weighting are introduced. Testing to 9,604 unseen words shows that the model parameters are easy to be tuned to reach high accuracy. Testing to 123 sentences containing homographs shows that the model could disambiguate homographs if it uses long graphemic context. Compare to information gain tree, PNNR gives slightly higher phoneme error rate, but it could disambiguate homographs

    Comparison between rule-based and data-driven natural language processing algorithms for Brazilian Portuguese speech synthesis

    Get PDF
    Due to the exponential growth in the use of computers, personal digital assistants and smartphones, the development of Text-to-Speech (TTS) systems have become highly demanded during the last years. An important part of these systems is the Text Analysis block, that converts the input text into linguistic specifications that are going to be used to generate the final speech waveform. The Natural Language Processing algorithms presented in this block are crucial to the quality of the speech generated by synthesizers. These algorithms are responsible for important tasks such as Grapheme-to-Phoneme Conversion, Syllabification and Stress Determination. For Brazilian Portuguese (BP), solutions for the algorithms presented in the Text Analysis block have been focused in rule-based approaches. These algorithms perform well for BP but have many disadvantages. On the other hand, there is still no research to evaluate and analyze the performance of data-driven approaches that reach state-of-the-art results for complex languages, such as English. So, in this work, we compare different data-driven approaches and rule-based approaches for NLP algorithms presented in a TTS system. Moreover, we propose, as a novel application, the use of Sequence-to-Sequence models as solution for the Syllabification and Stress Determination problems. As a brief summary of the results obtained, we show that data-driven algorithms can achieve state-of-the-art performance for the NLP algorithms presented in the Text Analysis block of a BP TTS system.Nos últimos anos, devido ao grande crescimento no uso de computadores, assistentes pessoais e smartphones, o desenvolvimento de sistemas capazes de converter texto em fala tem sido bastante demandado. O bloco de análise de texto, onde o texto de entrada é convertido em especificações linguísticas usadas para gerar a onda sonora final é uma parte importante destes sistemas. O desempenho dos algoritmos de Processamento de Linguagem Natural (NLP) presentes neste bloco é crucial para a qualidade dos sintetizadores de voz. Conversão Grafema-Fonema, separação silábica e determinação da sílaba tônica são algumas das tarefas executadas por estes algoritmos. Para o Português Brasileiro (BP), os algoritmos baseados em regras têm sido o foco na solução destes problemas. Estes algoritmos atingem bom desempenho para o BP, contudo apresentam diversas desvantagens. Por outro lado, ainda não há pesquisa no intuito de avaliar o desempenho de algoritmos data-driven, largamente utilizados para línguas complexas, como o inglês. Desta forma, expõe-se neste trabalho uma comparação entre diferentes técnicas data-driven e baseadas em regras para algoritmos de NLP utilizados em um sintetizador de voz. Além disso, propõe o uso de Sequence-to-Sequence models para a separação silábica e a determinação da tonicidade. Em suma, o presente trabalho demonstra que o uso de algoritmos data-driven atinge o estado-da-arte na performance dos algoritmos de Processamento de Linguagem Natural de um sintetizador de voz para o Português Brasileiro

    The effect of morphology on spelling and reading accuracy: A study on Italian children

    Get PDF
    In opaque orthographies knowledge of morphological information helps in achieving reading and spelling accuracy. In transparent orthographies with regular print-to-sound correspondences, such as Italian, the mappings of orthography onto phonology and phonology onto orthography are in principle sufficient to read and spell most words. The present study aimed to investigate the role of morphology in the reading and spelling accuracy of Italian children as a function of school experience to determine whether morphological facilitation was present in children learning a transparent orthography. The reading and spelling performances of 15 third-grade and 15 fifth-grade typically developing children were analyzed. Children read aloud and spelled both low-frequency words and pseudowords. Low-frequency words were manipulated for the presence of morphological structure (morphemic words vs non-derived words). Morphemic words could also vary for the frequency (high vs low) of roots and suffixes. Pseudo-words were made up of either a real root and a real derivational suffix in a combination that does not exist in the Italian language or had no morphological constituents. Results showed that, in Italian, morphological information is a useful resource for both reading and spelling. Typically developing children benefitted from the presence of morphological structure when they read and spelled pseudowords; however, in processing low-frequency words, morphology facilitated reading but not spelling. These findings are discussed in terms of morpho-lexical access and successful cooperation between lexical and sublexical processes in reading and spelling

    TooLiP : a development tool for linguistic rules

    Get PDF

    Cross-lingual transfer of phonological features for low-resource speech synthesis

    Get PDF

    Learning to Read Bilingually Modulates the Manifestations of Dyslexia in Adults

    Get PDF
    Published online: 28 Mar 2018According to the Grain Size Accommodation hypothesis (Lallier & Carreiras, 2017), learning to read in two languages differing in orthographic consistency leads to a cross-linguistic modulation of reading and spelling processes. Here, we test the prediction that bilingualism may influence the manifestations of dyslexia. We compared the deficits of English monolingual and early Welsh–English bilingual dyslexic adults on reading and spelling irregular English words and English-like pseudowords. As predicted, monolinguals were relatively more impaired in reading pseudowords than irregular words, whereas the opposite was true for bilinguals. Moreover, monolinguals showed stronger sublexical processing deficits than bilinguals and were poorer spellers overall. This study shows that early bilingual reading experience has long-lasting effects on the manifestations of dyslexia in adulthood. It demonstrates that learning to read in a consistent language like Welsh in addition to English gives bilingual dyslexic adults an advantage in English literacy tasks strongly relying on phonological processing.This research was funded by the Fyssen Foundation, the European Commission (FP7-PEOPLE-2010-IEF, Proposal N°274352, BIRD, to M.L) the European Research Council (ERC advanced grant, BILITERACY, to M.C., and ERC- 209704 to G.T.), the Spanish government (PSI2015-65338-P to M.L, and PSI2015-67353-R to M.C.), and the Economic and Social Research Council UK (RES-E024556-1 to G.T.). BCBL acknowledges funding from Ayuda Centro de Excelencia Severo Ochoa SEV-2015-0490

    Conversational Arabic Automatic Speech Recognition

    Get PDF
    Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task
    corecore