76 research outputs found

    Results of the Second SIGMORPHON Shared Task on Multilingual Grapheme-to-Phoneme Conversion

    Full text link
    Grapheme-to-phoneme conversion is an important component in many speech technologies, but until recently there were no multilingual benchmarks for this task. The second iteration of the SIGMORPHON shared task on multilingual grapheme-to-phoneme conversion features many improvements from the previous year's task (Gorman et al. 2020), including additional languages, a stronger baseline, three subtasks varying the amount of available resources, extensive quality assurance procedures, and automated error analyses. Four teams submitted a total of thirteen systems, at best achieving relative reductions of word error rate of 11% in the high-resource subtask and 4% in the low-resource subtask

    SMaTTS: standard malay text to speech system

    Get PDF
    This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed

    Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

    Get PDF
    The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

    Conversational Arabic Automatic Speech Recognition

    Get PDF
    Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task

    Différences de connectivité effective entre des enfants dyslexiques et des enfants lecteurs normaux pendant une tâche de lecture de pseudomots : une étude par IRMf

    Get PDF
    International audiencePurpose.—This fMRI study investigated phonological and lexicosemantic processing in dyslexic and in chronological age- and reading level-matched children in a pseudoword reading task.Materials and methods.—The effective connectivity network was compared between the three groups using a structural model including the supramarginal cortex (BA 40; BA: Brodmann area), fusiform cortex (BA 37) and inferior frontal cortex (BA 44/45) areas of the left hemisphere.Results.—The results revealed differences in connectivity patterns. In dyslexic patients, in contrast with chronological age- and reading level-matched groups, no causal relationship was demonstrated between BA 40 and BA 44/45. However, a significant causal relationship was demonstrated between BA 37 and BA 44/45 both in dyslexic children and in the reading levelmatchedgroup.Conclusions.—These findings were interpreted as evidence for a phonological deficit in developmental dyslexiaBut.—Explorer par imagerie fonctionnelle d’activation cérébrale chez l’enfant les aires corticales et les circuits cérébraux impliqués dans le traitement phonologique et lexico sémantique d’une tâche de lecture.Matériel et methods.—Un réseau d’aires cérébrales interconnectées est examiné sur la base d’un modèle structural incluant les cortex supramarginal (aire 40 de Brodmann), fusiforme (aire 37de Brodmann) et frontal inférieur (aires 44/45 de Brodmann) de l’hémisphère gauche. La méthode de modélisation proposée permet d’évaluer une différence de connectivité effective des circuits engagés au cours d’une tâche de lecture de pseudomots entre des enfants dyslexiques et des enfants normaux lecteurs appariés en âge chronologique et lexical.Résultats.—Chez les patients dyslexiques, contrairement aux groupes témoins appariés par l’âge ou le niveau de lecture, aucune interaction causale n’a été démontrée entre les aires 40 et 44/45 de Brodmann qui constituent les noeuds du circuit d’assemblage phonologique. En revanche, une interaction significative a été retrouvée au niveau du circuit d’adressage lexico sémantique, entre les aires 37 et 44/45 de Brodmann, chez les enfants dyslexiques et les enfants appariés par le niveau de lecture.Conclusions.—Ces résultats confirment l’existence d’un déficit des processus phonologiques dans la dyslexie développementale
    • …
    corecore