226 research outputs found

    Tone labelling algorithm for Sesotho

    Get PDF
    M.Sc., Faculty of Science, University of the Witwatersrand, 2011Studies have shown that text-to-speech systems need detailed prosodic models of a language in order to ideally sound natural to native speakers of the language. A text-to-speech system developed for Sesotho needs to have tone implemented in it since Sesotho is a tonal language which uses pitch variations to distinguish lexical and/or grammatical meaning. In order to implement tone for a language such as Sesotho, it is necessary for a tone modeling algorithm to receive as input the tone labels of the syllables of a word. This allows the algorithm to predict the appropriate intonation of the word. The aim of our study is to improve a basic tone labeling algorithm that predicts tone labels using three Sesotho tonal rules. The application of this algorithm is restricted to polysyllabic verb stems. The research study involves implementing an extended tone labeling algorithm that implements four additional Sesotho tonal rules and extends its application to all the other parts of speech. The results of our study show that the extended tone labeling algorithm significantly improves the basic algorithm by increasing the number of matched tone labels. Furthermore, our study provides the basic step to tone modeling for languages such as Sesotho which do not mark tone labels in orthography

    Phonological issues in the production of prosody by francophone and sinophone learners of english as a second language

    Get PDF
    Un accent de non-natif peut mener à une incompréhension ou à la perception de degrés différents d'accent d'étrangeté. La prosodie, qui est maintenant reconnue comme un élément important de l'impression d'étrangeté, est relativement peu abordée en recherche en acquisition des langues étrangères. Ceci contraste avec l'intérêt grandissant envers la prosodie en tant qu'élément de la langue maternelle. Dans cette thèse, la recherche phonologique est évaluée quant à sa pertinence dans la recherche sur la prosodie des langues étrangères. Deux aspects de la théorie phonologique sont étudiés: la typologie et l'organisation phonologique. Ce choix est justifié par la présomption générale que l'étrangeté prosodique est créée soit par une différence de typologie entre langue maternelle (L1) et langue étrangère (L2) soit par un transfert de traits prosodiques de la L1. La critique de la recherche en typologie phonologique conclut que, à ce stade, aucun modèle de classification prosodique n'est applicable à l'acquisition d'une L2. En particulier, l'étude démontre que certaines typologies, en particulier la théorie de l'isochronie accentuelle/l'isochronie syllabique de Pike, devraient être exclues parce qu'elles entravent les progrès en recherche sur l'acquisition et la production de la prosodie des langues étrangères. Le second aspect de la théorie phonologique étudié dans cette thèse est l'organisation phonologique. La prémisse est que les différences sous-jacentes à l'organisation prosodique plutôt que les différences phonologiques de surface sont transférées de L1 à L2. Les analyses approfondies de l'anglais nord américain, le français et le chinois standard révèlent d'importantes différences phonologiques entre l'anglais nord américain et les deux autres langues. Quatre expériences évaluent certaines de ces différences. La prosodie de l'anglais produite par des locuteurs natifs du français est analysée dans des phrases rythmiquement simples et des phrases rythmiquement plus complexes. Les résultats démontrent que l'accentuation lexicale est moins problématique que l'accentuation prosodique supra-lexicale. En particulier, il est démontré que les montées de fréquence fondamentale (F0) de début et de fin de syntagme accentuel (SA), typiques du français, sont source d'erreur dans la prosodie de l'anglais langue seconde. Il est cependant montré que cette erreur, bien que remarquée par les locuteurs natifs de l'anglais, n'affecte pas la perception de placement d'accentuation par ces derniers. La prosodie de l'anglais produite par des locuteurs natifs du chinois est analysée en termes de transfert de ton et d'alignement de pic de F0. Les résultats indiquent que les locuteurs du chinois utilisent les tons chinois quand ils produisent des tons accentuels de l'anglais; plus spécifiquement, la majorité des locuteurs utilisent le ton 2 (ton montant) quand ils produisent un ton accentuel montant. La dernière expérience révèle que les locuteurs natifs du chinois alignent le ton accentuel avec la syllabe accentuée à laquelle elle correspond de manière plus stricte que les locuteurs natifs de l'anglais nord américain le font. Les résultats de cette thèse génèrent un aperçu de la progression de la performance de la prosodie d'une langue étrangère. Les conclusions comportent des implications sur le contenu pédagogique et le format de l'enseignement de la prononciation. ______________________________________________________________________________ MOTS-CLÉS DE L’AUTEUR : Phonologie, Phonétique, Phonologie prosodique, Prosodie, Rythme, ESL, Français du Québec, Français de France, Chinois

    Tone classification of syllable -segmented Thai speech based on multilayer perceptron

    Get PDF
    Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

    Cross-regional word duration patterns in Mandarin

    Get PDF
    Duration contrasts can convey many types of information, including language background, word structure, word frequency, speech genre, intention, and emotions. An understanding of duration lays the foundation for many aspects of speech technology since duration plays a major role in speech production and perception. This dissertation explores the duration patterns of Mandarin words among three Mandarin dialectal regions---Beijing, Taiwan, and Malaysia. This dissertation brings diverse methodologies on speech data collection, annotation, and corpus construction to investigate linguistic pattern. Three speech production studies are conducted to explore the duration patterns of words with different length and internal structures. These studies reveal the general duration patterns of Mandarin Words. First of all, all the multi-syllabic words demonstrate the disyllabic long-short metrical form. Second, linguistic factors---syllable structure, positions (syllable position, word position, and sentence position), word frequency, word category, word internal structure, particle attachment, speech rate of sentence have significant effects on syllable duration. Thirdly, social factor---region interacts with multiple linguistic factors (word structure, syllable position, and particle attachment) and plays an important role in duration prediction. Quantitative data from these studies reveal that there are regional differences in rhythmic contrast among different Mandarin speaking regions. Beijing Mandarin speakers are more sensitive to the length change of linguistic unit and show stronger rhythmic contrast than speakers from Taiwan and Malaysia Mandarins. The results also display that Malaysia Mandarin speakers show the similar rhythmic pattern as Beijing Mandarin speakers. The investigation of duration patterns in this dissertation provides a detailed description of word duration in Mandarin. This dissertation also provides the foundation for further research on duration pattern related super-segmental feature. A comprehensive understanding of duration pattern with linguistic and social factors is helpful to improve the quality of durational models used in speech technology

    How Movie Dubbing Can Help Native Chinese Speakers’ English Pronunciation

    Get PDF
    The purpose of this study was to determine if the use of English movie scripts and movie dubbing activities can help native Chinese speakers improve their awareness of prosodic features in English, specifically, sentence stress. The literature review explores Chinese and English prosody, movie dubbing and ideal pronunciation standards. The qualitative research paradigm was implemented to explore the hypothesis that hearing and mimicking the natural speech patterns of native speakers can help native Chinese speakers improve their awareness of sentence stress in English. After three cycles of language instruction and language discrimination activities, seven students were chosen for a case study. Data collected from their responses to activities and questionnaires were analyzed. The results indicate that these students’ actual ability to hear sentence stress is greater than their theoretical awareness of sentence stress rules. The author concludes with recommendations for adapting movie dubbing activities and suggestions for future research

    Analyzing Prosody with Legendre Polynomial Coefficients

    Full text link
    This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis

    Functional timing or rhythmical timing, or both? A corpus study of English and Mandarin duration

    Get PDF
    It has been long held that languages of the world are divided into rhythm classes so that they are either stress-timed, syllable-timed or mora-timed. It is also known for a long time that duration serves various informational functions in speech. But it is unclear whether these two kinds of uses of duration are complementary to each other, or they are actually one and the same. There has been much empirical research that raises questions about the rhythm class hypothesis due to lack of evidence of the suggested isochrony in any language. Yet the alleged cross-language rhythm classification is still widely taken for granted and continues to be researched. Here we conducted a corpus study of English, an archetype of a stress-timed language, and Mandarin, an alleged syllable-timed language, to look for evidence of at least a tendency toward isochrony when much of the informational use of duration is controlled for. We examined the relationship between segment and syllable duration and the relationship of syllable and phrase duration in the two languages. The results show that in English syllables are largely incompressible to allow stress-timing because segment duration is inflexible to allow variable syllable duration beyond its functional use. Surprisingly, Mandarin does show a small tendency toward both equal syllable duration and equal phrase duration. Additionally, the duration of pre-boundary syllables in English increases linearly with break index, whereas in Mandarin, the duration increase stops after break index 2, which is accompanied by the insertion of silent pauses. We conclude, therefore, timing and duration in speech are predominantly used for encoding information rather being controlled by a rhythmic principle, and the residual equal-duration tendency in the two languages examined here show exactly the opposite patterns from the predictions of the rhythm class hypothesis

    Linguistic dimensions of second language accent and comprehensibility:Nonnative listeners' perspectives

    Get PDF
    The current study investigated the effect of listener status (native, nonnative) and language background (French, Mandarin) on global ratings of second language speech. Twenty-six nonnative English listeners representing the two language backgrounds (n = 13 each) rated the comprehensibility and accentedness of 40 French speakers of English. These same speakers were previously rated by native listeners and coded for 19 linguistic measures of speech (e.g., segmental errors, word stress errors, grammar accuracy) in Trofimovich and Isaacs (2012). Analyses indicated no difference in global ratings between nonnative and native listeners, or between the two nonnative listener groups. Similarly, no major differences in the linguistic dimensions associated with each group’s ratings existed. However, analyses of verbal reports for a subset of nonnative listeners (n = 5 per group) demonstrated that each group attributed their ratings to somewhat different linguistic cues

    Prosody and Wavelets: Towards a natural speaking style conversion

    Get PDF
    Speech is the basis of human communication: in everyday life we automatically decode speech into language regardless of who speaks. In a similar way, we have the ability to recognize di erent speakers, despite the linguistic content of the speech. Additionally to the voice individuality of the speaker, the particular prosody of speech involves relevant information concerning the identity, age, social group or economical status of the speaker, helping us identify the person to whom we are talking without seeing the speaker. Voice conversion systems deal with the conversion of a speech signal to sound as if it was uttered by another speaker. It has been an important amount of work in the conversion of the timber of the voice, the spectral features, meanwhile the conversion of pitch and the way it temporarily evolves, modeling the speaker dependent prosody, is mostly achieved by just controlling the level and range. This thesis focuses on prosody conversion, proposing an approach based on a wavelet transformation of the pitch contours. It has been performed a study of the wavelet domain, discerning among the di erent timing of the prosodic events, thus allowing an improved modeling of them. Consequently, the prosody conversion is achieved in the wavelet domain, using regression techniques originally developed for the spectral features conversion, in voice conversion systems
    • …
    corecore