683 research outputs found

    LPC-based diphone synthesis for the PolyGlot text-to-speech system

    Get PDF

    Rule extraction for allophone synthesis:final report ALLODIF

    Get PDF

    SMaTTS: standard malay text to speech system

    Get PDF
    This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed

    Close Copy Speech Synthesis for Speech Perception Testing

    Get PDF
    The present study is concerned with developing a speech synthesis subcomponent for perception testing in the context of evaluating cochlear implants in children. We provide a detailed requirements analysis, and develop a strategy for maximally high quality speech synthesis using Close Copy Speech synthesis techniques with a diphone based speech synthesiser, MBROLA. The close copy concept used in this work defines close copy as a function from a pair of speech signal recording and a phonemic annotation aligned with the recording into the pronunciation specification interface of the speech synthesiser. The design procedure has three phases: Manual Close Copy Speech (MCCS) synthesis as a ?best case gold standard?, in which the function is implemented manually as a preliminary step; Automatic Close Copy Speech (ACCS) synthesis, in which the steps taken in manual transformation are emulated by software; finally, Parametric Close Copy Speech (PCCS) synthesis, in which prosodic parameters are modifiable while retaining the diphones. This contribution reports on the MCCS and ACCS synthesis phases

    English Speech Synthesizer with Speech Error Processing Features: Elision and Assimilation

    Get PDF
    Speech synthesis is one of the in Natural Language Processing (NLP). NLP is a subfield ofartificial intelligence and linguistic. It studies the problem ofprocessing and manipulation natural language and from the studies it process to make the computer understand human language. NLP have a lot ofmajor task such as text to speech that is speech synthesizer, speech recognition, machine translation, information retrieval, and many more In this projects, the system are involve a massive usage of rules to syllabify the words into their respective syllables and to check for English elision and English assimilation rules ifany before the correct output ofsound can be produced. Elision is omission of one sound or more. The letter that involves elision is sounded unfamiliar for the speaker. Whereby, assimilation is concern with one sound becoming phonetically same with the adjacent sound. In this project, I demonstrate the syllabification approach that been introduced to me by Norshuhani, and will also adopt the English elision and assimilation rules to the speech synthesizer.

    Improving Phoneme to Viseme Mapping for Indonesian Language

    Get PDF
    The lip synchronization technology of animation can run automatically through the phoneme-to-viseme map. Since the complexity of facial muscles causes the shape of the mouth to vary greatly, phoneme-to-viseme mapping always has challenging problems. One of them is the allophone vowel problem. The resemblance makes many researchers clustering them into one class. This paper discusses the certainty of allophone vowels as a variable of the phoneme-to-viseme map. Vowel allophones pre-processing as a proposed method is carried out through formant frequency feature extraction methods and then compared by t-test to find out the significance of the difference. The results of pre-processing are then used to reference the initial data when building phoneme-to-viseme maps. This research was conducted on maps and allophones of the Indonesian language. Maps that have been built are then compared with other maps using the HMM method in the value of word correctness and accuracy. The results show that viseme mapping preceded by allophonic pre-processing makes map performance more accurate when compared to other maps

    Ukrainian vowel phones in the IPA context

    Get PDF
    Acoustic and articulatory properties of Ukrainian vowels are investigated in this study and a full set of relevant IPA notations are proposed. The notations are shown in the vowel diagram and the table. The results of the earlier acoustic invariant speech analysis based on special software, auditory and spectrum analysis were used and the results are discussed in the context of general and Ukrainian phonetic laws governing language evolution and acoustic properties of non-stressed vowels in relation to their stressed cognates. Such combined approach resulted in a more detailed vowel inventory than proposed heretofore. The findings of this research contribute to better understanding of Ukrainian language and its special features in comparison with other world languages that may have substantial practical use in various phonetic and translation studies, as well as in modern linguistic technologies aimed at artificial intelligence development, machine translation incorporating text-to-speech conversion, automatic speech analysis, recognition and synthesis, and in other areas of applied linguistics

    A Tutorial on Acoustic Phonetic Feature Extraction for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) Applications in African Languages

    Get PDF
    At present, Siri, Dragon Dictate, Google Voice, and Alexa-like functionalities are not available in any indigenous African language. Yet, a 2015 Pew Research found that between 2002 to 2014, mobile phone usage increased tenfold in Africa, from 8% to 83%.[1] The Acoustic Phonetic Approach (APA) discussed in this paper lays the foundation that will make Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) applications possible in African languages. The paper is written as a tutorial so that others can use the information therein to help digitalize many of the continent’s indigenous languages. [1] http://www.pewglobal.org/2015/04/15/cell-phones-in-africa-communication-lifeline/. Retrieved on November 10, 2017

    A speech interface for air traffic control terminals

    Get PDF
    Several issues concerning the current use of speech interfaces are discussed and the design and development of a speech interface that enables air traffic controllers to command and control their terminals by voice is presented. A special emphasis is made in the comparison between laboratory experiments and field experiments in which a set of ergonomics-related effects are detected that cannot be observed in the controlled laboratory experiments. The paper presents both objective and subjective performance obtained in field evaluation of the system with student controllers at an air traffic control (ATC) training facility. The system exhibits high word recognition test rates (0.4% error in Spanish and 1.5% in English) and low command error (6% error in Spanish and 10.6% error in English in the field tests). Subjective impression has also been positive, encouraging future development and integration phases in the Spanish ATC terminals designed by Aeropuertos Españoles y Navegación Aérea (AENA)
    corecore