468 research outputs found

    Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

    Full text link
    The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.Comment: Accepted by INTERSPEECH 202

    The Prosodic System of Southern Bobo Madare

    Full text link
    This dissertation describes the word-level and phrase-level prosodic system of Southern Bobo Madare (Bobo), a Mande language of Burkina Faso. I examine tonal aspects of Bobo’s prosodic system and provide an extensive phonetic description of the use of non-modal phonation and final lengthening to mark utterance type. The data examined include both elicitation tasks and spontaneous speech tasks. The work is conducted within the framework of autosegmental-metrical theory (Pierrehumbert 1980). Several aspects of the word-level prosodic system are discussed. Previous work on Bobo (Morse, 1976; Le Bris & Prost, 1981; Sanou, 1993) disagree on the inventory of contour tones and the existence of word stress. I present an analysis in support of three contour tones: High-Low, Low-High, and Low-Mid. I do not find clear phonetic evidence of word stress. Phonological analysis supports the existence of stress however: The distribution of reduced vowels supports the existence of iambic prosodic feet, which is common in Mande languages. Furthermore, the distribution of tone melodies is best explained by assuming that tone melodies are assigned to the foot rather than to the word or morpheme, similar to Leben’s (2001) proposal for tonal feet in Bamana. While both word-level and phrase-level prosody are discussed, most attention is given to phrase-level prosodic phenomena. In recent years, there has been increased interest in the phrase-level prosody of African tone languages (Downing & Rialland, 2016). However, detailed descriptions of the phrase-level prosody of Mande languages still remain extremely rare. This is the first such description of a Mande language with three tone levels. Bobo makes relatively little use of intonational tones. Declarative statements are marked only through final lengthening and in some cases non-modal vowel phonation. Polar questions show some characteristics of the areal “lax question prosody” described by Rialland (2009): L% boundary tone, which is concatenated onto the string of lexical tones, extreme lengthening of the phrase-final segment (always a vowel in Bobo), and breathy utterance termination. This L% boundary tone is the only clear case of an intonational tone in Bobo. Wh-questions can (but typically do not) have an L% boundary tone and have a lesser degree of phrase-final lengthening than polar questions. Negated statements do not have special prosodic characteristics. The phrase-level prosodic hierarchy of Bobo is relatively flat, consisting of only the intonational phrase. In addition to investigating the prosodic marking of utterance type, I present an investigation into focus marking in Bobo. I examine the responses to wh-questions and corrections, two contexts in which focus-marking is typically found cross-linguistically. I find no evidence of morphosyntactic or prosodic focus marking in these contexts. Bobo is therefore an additional example of an African tone language without obligatory focus marking in these contexts. The relevance of these results to our current understanding of prosodic typology is discussed throughout.PHDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163107/1/ksher_1.pd

    Acoustic measure of fundamental frequency during three speech tasks in vocally healthy children

    Get PDF
    The present study examined the fundamental frequency (F0) during three speech tasks in a group of vocally healthy children. The study also compared the reliability of different speech tasks for eliciting F0. Fifty-six vocally healthy children (31 boys and 25 girls) between the ages of 7.0 and 10.11 years participated in this study. Each child completed three speech tasks used to elicit a voice sample for subsequent analysis of fundamental frequency (F0). The tasks included: (a) sustained vowel /a/ prolongation, (b) repeating a sentence, and (c) reading aloud a passage. Two types of reliability, between-trial and between-day reliability, were compared across speech tasks. Results revealed significant difference in F0 between the three speech tasks (p = 0.01). Post hoc comparisons revealed that vowel task elicited significantly higher F0 values than the passage task. Passage reading task yielded the highest intra-class correlation coefficient values for both between-trial and between-day reliability. The results provide some empirical data for standardizing voice assessment protocol for school-age children.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Segment Prolongation in Hungarian

    Get PDF
    Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interestingto study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund &amp; Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is toJapanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.Also TMH-QPSR volume 58(1)</p

    Prolongation in German

    Get PDF
    Betz S, Eklund R, Wagner P. Prolongation in German. In: Eklund R, Rose R, eds. Proceedings of DiSS 2017, Disfluency in Spontaneous Speech. TMH-QPSR. Vol 58. Stockholm: Royal Institute of Technology, Sweden; 2017: 13-16

    Relationship of speech rhythm, stuttering frequency and discourse type

    Get PDF
    The present study aimed to compare the speech rhythm of reading and conversation in Cantonese and investigates the relationship between stuttering frequency and speech rhythm across the two types of discourse. Eight native Cantonese-speaking adults diagnosed with stuttering participated in the study. Each participant read a non-emotion-provoking expository passage in the reading task and engaged in conversation on casual topics with the investigator in the conversation task. Speech rhythm and stuttering frequency of the collected speech samples were analyzed. Speech pattern in reading was shown to be more syllable-timed than in conversation using acoustic analysis. However, results showed no significant difference in stuttering frequency in reading and conversation. The relationship between difference in speech rhythm and stuttering frequency in reading and conversation in Cantonese was discussed with reference to the current model of causes of stuttering and the linguistic features of Cantonese. The findings provided insight on appropriate use of reading and conversation tasks in clinical assessment and treatment of stuttering.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Using the ToBI transcription to record the intonation of Slovene

    Get PDF
    The paper presents ToBI, a transcription method for prosodic annotation. ToBI is an acronym for Tones and Breaks Indices which first denoted an intonation system developed in the 1990s for annotating intonation and prosody in the database of spoken Mainstream American English. The MAE_ToBI transcription originally consists of six parts - the audio recording of the utterance, the fundamental frequency contour and four parallel tiers for the transcription of tone sequence, ortographic transcription, indication of break indices between words and for additional observations. The core of the transcription, i. e. of the phonological analyses of the intonation pattern, is represented by the tone tier where tonal variation is transcribed by using labels for high tone and low tone where a tone can appear as a pitch accent, phrase accentand boundary tone. Due to its simplicity and flexibility, the system soon began to be used for the prosodic annotation of other variants of English and many other languages, as well as in different non-linguistic fields, leading to the creation of many new ToBI systems adapted to individual languages and dialects. The author is the first to use this method for Slovene, more precisely, for the intonational transcription and analysis of the corpus of spontaneous speech of Slovene Istria, in order to investigate if the ToBi system is useful for the annotation of Slovene and its regional variants.Članek predstavlja ToBI, transkripcijsko metodo za zapis prozodičnih dogodkov. ToBI je kratica za Tones and Breaks Indices, ki izvirno poimenuje intonacijski sistem, ki je bil razvit v 90-ih letih prejšnjega stoletja in zgrajen za označevanje intonacije in prozodije v podatkovni bazi govorjene ameriške angleščine (Mainstream American English). MAE_ToBI transkripcija po prvotnem dogovoru sestoji iz šestih delov - iz zvočnega posnetka izreka, zapisa poteka osnovne frekvence in štirih vzporedno poravnanih pasov, ki so namenjeni transkripciji tonskega poteka, ortografskemu zapisu izreka, označevanju jakosti mej med besedami ter zapisovanju dodatnih opazovanj. Jedro zapisa oziroma fonoloških analiz intonacijskega vzorca predstavlja tonski pas, v katerem z oznakami za visoki in nizki ton transkribiramo razlikovalna tonska nihanja. Sistem se je zaradi svoje enostavnosti in prilagodljivosti hitro razširil na prozodično označevanje ostalih variant angleščine in mnogih drugih jezikov ter na različna nelingvistična področja, nastali so številnih novih ToBI-sistemi, prilagojeni posameznim jezikom ali narečjem. Metoda je bila prvič uporabljena za zapis in analizo intonacije na korpusu spontanega govora govorcev v Slovenski Istri z namenom preizkusiti, v kolikšni meri je ToBI primeren za opis intonacije slovenskega jezika in njegovih pokrajinskih različic

    Classification of Types of Stuttering Symptoms Based on Brain Activity

    Get PDF
    Among the non-fluencies seen in speech, some are more typical (MT) of stuttering speakers, whereas others are less typical (LT) and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT) whole-word repetitions (WWR) should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type
    corecore