22 research outputs found

    ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

    Get PDF
    国立大学法人長岡技術科学大

    Pemenggalan Kata Dasar Bahasa Indonesia menggunakan Syllabification Algorithm

    Get PDF
    Abstract — This paper presents a study of Indonesian syllable structure and an algorithm for identifying syllables in Indonesian words. The algorithm implements a set of syllabification rules based on the Regulation of the Minister of Education and Culture of the Republic of Indonesia Number 50 Year 2015 on General Guidelines for Indonesian Spelling. Experiment on a random sample shows that the syllabification algorithm achieves 100% word accuracy. Random sample contains 300 Indonesian words. Weakness of this algorithm can not handle Indonesian words contain ‘ng’, ‘ny’, ‘kh’, ‘sy’, ‘ks’, and words contain three consecutive consonant letters. Keywords — text to speech system, speech synthesis, syllabification algorithm, rule-based method Abstrak - Makalah ini menyajikan studi tentang struktur suku kata bahasa Indonesia dan algoritma untuk mengidentifikasi suku kata dalam kata-kata bahasa Indonesia. Algoritma mengimplementasikan seperangkat aturan pembagian suku kata berdasarkan Peraturan Menteri Pendidikan dan Kebudayaan Republik Indonesia Nomor 50 Tahun 2015 tentang Pedoman Umum Ejaan Bahasa Indonesia. Percobaan pada sampel acak menunjukkan bahwa algoritma pembagian suku kata mencapai akurasi kata 100%. Sampel acak mengandung 300 kata bahasa Indonesia. Kelemahan dari algoritma ini tidak dapat menangani kata-kata bahasa Indonesia berisi 'ng', 'ny', 'kh', 'sy', 'ks', dan kata-kata mengandung tiga huruf konsonan berturut-turut. Kata kunci - sistem text to speech, speech synthesis, algoritma pembagian suku kata, metode rule-base

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field

    Greek prosodies and the nature of syllabification

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 1982.MICROFICHE COPY AVAILABLE IN ARCHIVES AND HUMANITIESBibliography: leaves 380-385.by Donca Steriade.Ph.D

    Implementing and Improving a Speech Synthesis System

    Get PDF
    Tato práce se zabývá syntézou řeči z textu. V práci je podán základní teoretický úvod do syntézy řeči z textu. Práce je postavena na MARY TTS systému, který umožňuje využít existujících modulů k vytvoření vlastního systému pro syntézu řeči z textu, a syntéze řeči pomocí skrytých Markovových modelů natrénovaných na vytvořené řečové databázi. Bylo vytvořeno několik jednoduchých programů ulehčujících vytvoření databáze a přidání nového jazyka a hlasu pro MARY TTS systém bylo demonstrováno. Byl vytvořen a publikován modul a hlas pro Český jazyk. Byl popsán a implementován algoritmus pro přepis grafémů na fonémy.This work deals with text-to-speech synthesis. A general theoretical introduction to TTS is~given. This work is based on the MARY TTS system which allows to use existing modules for the creation of an own text-to-speech system and a speech synthesis model using hidden Markov models trained on the created speech database. Several simple programs to ease database creation were created and adding a new language and voice to the MARY TTS system was shown hot to add. The Czech language module and voice for the MARY TTS system was created and published. An algorithm for grapheme-to-phoneme transcription was described and implemented.

    The mora and the syllable in KiMvita (Mombasa Swahili) and Japanese.

    Get PDF
    This thesis deals mainly with aspects of the phonology of KiMvita, the Swahili dialect spoken in Mombasa, and has special reference to moraic nasals. The KiMvita analysis is then compared to that of Standard Japanese. The framework of moraic theory that is employed is based on Hyman's (1985) "Weight Theory". The theories of Feature Geometry (FG) and Lexical Phonology (LP) are also employed in the analysis. Nasal+Consonant (N+C) sequences occur in two ways in KiMvita: (i) a sequence of a moraic nasal and a consonant; (ii) a prenasalized obstruent. The analysis of the varying expressions of nasality, either as a moraic segment or as an element of a complex segment shows considerable dependence upon the morphology concerned. In addition to N+C sequences, the analysis of Consonant+Glide (C+G) sequences turns out to be great relevance; these two different types of composite segment differ in underlying representation as well as in surface syllabification. Here too LP enables us to distinguish two distinct surface forms (light diphthongs and complex consonants) in terms of lexical vs. post-lexical levels. Syllable construction in this study crucially requires both an onset and a nucleus. Processes of syllabification will be discussed based on this theoretical requirement together with the following two assumptions: (i) strictly left-to-right syllabification; (ii) priority of the Onset Creation Rule. This study proposes that the accent bearer both in KiMvita and Japanese is not the syllable, which is generally claimed in the literature, but the mora - though this may be associated with a syllable node. Moraic nasals are generally associated with the second mora of a bimoraic syllable word-intemally in both KiMvita and Japanese. However, there is one significant difference in the status of the second mora in these two languages: it may bear accent in KiMvita, while it may not in Japanese. As far as these two languages are concerned, the phonetic evidence suggests that the actual segment duration could explain why such a difference occurs

    ACOUSTIC CORRELATES OF LEXICAL STRESS IN NATIVE SPEAKERS OF UYGHUR AND L2 LEARNERS

    Get PDF
    Some syllables are louder, longer and stronger than other syllables at the lexical level. These prominent prosodic characteristics of certain syllables are captured by suprasegmental features including fundamental frequency, duration and intensity. A language like English uses fundamental frequency, duration and intensity to distinguish stressed syllables from unstressed syllables; however, a language like Japanese only uses fundamental frequency to distinguish the stressed syllables from unstressed syllables. This study investigates the stress pattern of Uyghur, a Turkic language, as produced by native and non-native speakers. The first three experiments provide a detailed phonetic analysis in order to determine the acoustic cues to stress in Uyghur. In Experiment 1, six disyllabic minimal pairs (e.g., A-cha, a-CHA), contrasting in location of stress, were produced by five native Uyghur speakers with three repetitions in a fixed sentence context. In order to generalize the results from the small set of minimal pairs in the first experiment, Experiment 2 examined the initial syllable of disyllabic nouns that contrasted in first-syllable stress (e.g., DA-ka, da-LA) while syllabic structure (CV versus CVC) was also manipulated. In both experiments, average fundamental frequency, syllable duration, and average intensity were collected in accented and unaccented syllables. The results from both experiments showed that there were significant differences in duration and intensity between stressed and unstressed syllables, with the intensity differences moderated by syllable structure. No difference was found in fundamental frequency. Experiment 3 investigated the role of F0 in lexical stress. Experiment 3 focused on the interaction between sentential intonation and lexical stress in which the declarative assertion sentence (falling F0) and the declarative question sentence (rising F0) were used. The results confirmed the previous experiments. No interaction between sentential intonation and lexical stress indicated that the obtained duration effect was due to lexical stress. There were no effects of fundamental frequency or intensity in terms of stress. While previous studies have classified Uyghur as a pitch-accent and a stress-accent language, the present acoustic data suggest that native speakers make no use of pitch cues to signal stress in Uyghur. Previous research has focused on the acquisition of lexical stress by non-native speakers of English. This study also examined the acquisition of lexical stress by English learners of Uyghur. Five highly advanced English learners of Uyghur produced the six minimal pairs and disyllabic nouns contrasting in the first syllables. The stimuli that were produced by L2 learners were the same as in Experiment 1 and Experiment 2. Highly advanced Uyghur learners used duration as a cue and did not use fundamental frequency and intensity as stress cues. The results indicated that native-like lexical stress can be acquired at the high advanced level

    Computational Etymology: Word Formation and Origins

    Get PDF
    While there are over seven thousand languages in the world, substantial language technologies exist only for a small percentage of these. The large majority of world languages do not have enough bilingual or even monolingual data for developing technologies like machine translation using current approaches. The computational study and modeling of word origins and word formation is a key step in developing comprehensive translation dictionaries for low-resource languages. This dissertation presents novel foundational work in computational etymology, a promising field which this work is pioneering. The dissertation also includes novel models of core vocabulary, dictionary information distillation, and of the diverse linguistic processes of word formation and concept realization between languages, including compounding, derivation, sense-extension, borrowing, and historical cognate relationships, utilizing statistical and neural models trained on the unprecedented scale of thousands of languages. Collectively these are important components in tackling the grand challenges of universal translation, endangered language documentation and revitalization, and supporting technologies for speakers of thousands of underserved languages

    The production and perception of Libyan Arabic stress patterns by English speaking learners: A comparison with native speakers

    Get PDF
    This dissertation examines the production and perception of some selected stress patterns in Libyan Arabic by English speaking learners and compares them to the production and perception of the native speakers. Two tasks were utilised to investigate the participants’ performance: a picture naming and an identification task. Word patterns covered potential problematic and non-problematic areas. An optimality theoretic approach is adopted in the discussion of the results of the perception and production of stress by the participants (Chapters 5 & 7) while a metrical approach is referred to in the discussion of the Libyan Arabic stress system in Chapter 3. It is found that structural effects (e.g. syllable structure, vowel quality, syllable position or class) have consequences on how the learners perceive and produce stress and on how they use this information in assigning stress. The study found that if the stress patterns match in the L1 and L2, and they follow regular phonological conditions, the learners get these patterns right by just applying the predictable patterns. If the stress patterns are similar but applied differently and they contradict predictable conditions, these unpredictable and/or marked patterns are not accessible in the L2 despite their partial availability in the L1. If a particular stress pattern does not exist in the L1, then the L1 negative transfer effect may appear in the L2. The misperception of stress is not only restricted to L2 learners but native speakers also fail in certain patterns to perceive the stress location. The learners use grammatical class and syllable structure as stress indicators but they show a deviation from the native speakers in using the vowel length cue. The native speakers are more sensitive to vowel length; the absence of vowel length or syllable closure in the stressed syllable in certain patterns prevent the native speakers from perceiving stress accurately
    corecore