Search CORE

468 research outputs found

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

Author: Huang Qiaochu
Kang Shiyin
Lei Shun
Li Weiqin
Meng Helen
Wu Zhiyong
Zhou Yixuan
Publication venue
Publication date: 31/08/2023
Field of study

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive

An Application of Comparative Corpora of Interactional Data - toward the Sound Profiles of Sites of Initiation in French and Mandarin Recycling Repair

Author: Chen Helen Kai-yun
Publication venue: Department of English, National Chengchi University
Publication date: 01/01/2013
Field of study

Waseda University Repository

The Prosodic System of Southern Bobo Madare

Author: Sherwood Kate
Publication venue
Publication date: 01/01/2020
Field of study

This dissertation describes the word-level and phrase-level prosodic system of Southern Bobo Madare (Bobo), a Mande language of Burkina Faso. I examine tonal aspects of Bobo’s prosodic system and provide an extensive phonetic description of the use of non-modal phonation and final lengthening to mark utterance type. The data examined include both elicitation tasks and spontaneous speech tasks. The work is conducted within the framework of autosegmental-metrical theory (Pierrehumbert 1980). Several aspects of the word-level prosodic system are discussed. Previous work on Bobo (Morse, 1976; Le Bris & Prost, 1981; Sanou, 1993) disagree on the inventory of contour tones and the existence of word stress. I present an analysis in support of three contour tones: High-Low, Low-High, and Low-Mid. I do not find clear phonetic evidence of word stress. Phonological analysis supports the existence of stress however: The distribution of reduced vowels supports the existence of iambic prosodic feet, which is common in Mande languages. Furthermore, the distribution of tone melodies is best explained by assuming that tone melodies are assigned to the foot rather than to the word or morpheme, similar to Leben’s (2001) proposal for tonal feet in Bamana. While both word-level and phrase-level prosody are discussed, most attention is given to phrase-level prosodic phenomena. In recent years, there has been increased interest in the phrase-level prosody of African tone languages (Downing & Rialland, 2016). However, detailed descriptions of the phrase-level prosody of Mande languages still remain extremely rare. This is the first such description of a Mande language with three tone levels. Bobo makes relatively little use of intonational tones. Declarative statements are marked only through final lengthening and in some cases non-modal vowel phonation. Polar questions show some characteristics of the areal “lax question prosody” described by Rialland (2009): L% boundary tone, which is concatenated onto the string of lexical tones, extreme lengthening of the phrase-final segment (always a vowel in Bobo), and breathy utterance termination. This L% boundary tone is the only clear case of an intonational tone in Bobo. Wh-questions can (but typically do not) have an L% boundary tone and have a lesser degree of phrase-final lengthening than polar questions. Negated statements do not have special prosodic characteristics. The phrase-level prosodic hierarchy of Bobo is relatively flat, consisting of only the intonational phrase. In addition to investigating the prosodic marking of utterance type, I present an investigation into focus marking in Bobo. I examine the responses to wh-questions and corrections, two contexts in which focus-marking is typically found cross-linguistically. I find no evidence of morphosyntactic or prosodic focus marking in these contexts. Bobo is therefore an additional example of an African tone language without obligatory focus marking in these contexts. The relevance of these results to our current understanding of prosodic typology is discussed throughout.PHDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163107/1/ksher_1.pd

Deep Blue Documents at the University of Michigan

Acoustic measure of fundamental frequency during three speech tasks in vocally healthy children

Author: Lam Lai-na
林麗娜
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2011
Field of study

The present study examined the fundamental frequency (F0) during three speech tasks in a group of vocally healthy children. The study also compared the reliability of different speech tasks for eliciting F0. Fifty-six vocally healthy children (31 boys and 25 girls) between the ages of 7.0 and 10.11 years participated in this study. Each child completed three speech tasks used to elicit a voice sample for subsequent analysis of fundamental frequency (F0). The tasks included: (a) sustained vowel /a/ prolongation, (b) repeating a sentence, and (c) reading aloud a passage. Two types of reliability, between-trial and between-day reliability, were compared across speech tasks. Results revealed significant difference in F0 between the three speech tasks (p = 0.01). Post hoc comparisons revealed that vowel task elicited significantly higher F0 values than the passage task. Passage reading task yielded the highest intra-class correlation coefficient values for both between-trial and between-day reliability. The results provide some empirical data for standardizing voice assessment protocol for school-age children.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Segment Prolongation in Hungarian

Author: Eklund Robert
Gósy Mária
Publication venue
Publication date: 01/01/2017
Field of study

Segment prolongation (PR) has been shown to be one of the most common forms of non-pathological speech disfluencies (Eklund, 2001). The distribution of PRs in the word (initial–medial–final segment) seems to vary between languages of different syllable-structure complexity, making it interestingto study segment prolongation in languages that exhibit different syllable structure characteristics. Previous studies have studied languages with complex syllable structure, such as English and Swedish (Eklund & Shriberg, 1998; Eklund, 2001, 2004) where affixation creates complex consonant clusters, and languages with very simple syllable, such as Japanese (Den, 2003) or Tok Pisin (Eklund, 2001, 2004), as well as Mandarin Chinese (Lee et al., 2004). In this paper we study PRs in Hungarian. Our results indicate that PRs in Hungarian are more similar to English and Swedish than it is toJapanese, Tok Pisin or Mandarin Chinese, which lends support to the notion that underlying morphology plays a role in how PRs is realised.Also TMH-QPSR volume 58(1)</p

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Repository of the Academy's Library

Prolongation in German

Author: Betz Simon
Eklund Robert
Eklund Robert
Rose Ralph
Wagner Petra
Publication venue: Royal Institute of Technology, Sweden
Publication date: 01/01/2017
Field of study

Betz S, Eklund R, Wagner P. Prolongation in German. In: Eklund R, Rose R, eds. Proceedings of DiSS 2017, Disfluency in Spontaneous Speech. TMH-QPSR. Vol 58. Stockholm: Royal Institute of Technology, Sweden; 2017: 13-16

Publications at Bielefeld University

Relationship of speech rhythm, stuttering frequency and discourse type

Author: Lam Hiu-fung, Stephen
林曉峰
Publication venue: The University of Hong Kong (Pokfulam, Hong Kong)
Publication date: 01/01/2013
Field of study

The present study aimed to compare the speech rhythm of reading and conversation in Cantonese and investigates the relationship between stuttering frequency and speech rhythm across the two types of discourse. Eight native Cantonese-speaking adults diagnosed with stuttering participated in the study. Each participant read a non-emotion-provoking expository passage in the reading task and engaged in conversation on casual topics with the investigator in the conversation task. Speech rhythm and stuttering frequency of the collected speech samples were analyzed. Speech pattern in reading was shown to be more syllable-timed than in conversation using acoustic analysis. However, results showed no significant difference in stuttering frequency in reading and conversation. The relationship between difference in speech rhythm and stuttering frequency in reading and conversation in Cantonese was discussed with reference to the current model of causes of stuttering and the linguistic features of Cantonese. The findings provided insight on appropriate use of reading and conversation tasks in clinical assessment and treatment of stuttering.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

HKU Scholars Hub

Recommended from our members

Rabbit models of heart disease.

Author: Bers Donald M
Pogwizd Steven M
Publication venue: eScholarship, University of California
Publication date: 01/01/2008
Field of study

Human heart disease is a major cause of death and disability. A variety of animal models of cardiac disease have been developed to better understand the etiology, cellular and molecular mechanisms of cardiac dysfunction and novel therapeutic strategies. The animal models have included large animals (e.g. pig and dog) and small rodents (e.g. mouse and rat) and the advantages of genetic manipulation in mice have appropriately encouraged the development of novel mouse models of cardiac disease. However, there are major differences between rodent and human hearts that raise cautions about the extrapolation of results from mouse to human. The rabbit is a medium-sized animal that has many cellular and molecular characteristics very much like human, and is a practical alternative to larger mammals. Numerous rabbit models of cardiac disease are discussed, including pressure or volume overload, ischemia, rapid-pacing, doxorubicin, drug-induced arrhythmias, transgenesis and infection. These models also lead to the assessment of therapeutic strategies which may become beneficial in human cardiac disease

eScholarship - University of California

Using the ToBI transcription to record the intonation of Slovene

Author: Volk Jana
Publication venue: Znanstvena založba Filozofske fakultete
Publication date: 30/12/2015
Field of study

The paper presents ToBI, a transcription method for prosodic annotation. ToBI is an acronym for Tones and Breaks Indices which first denoted an intonation system developed in the 1990s for annotating intonation and prosody in the database of spoken Mainstream American English. The MAE_ToBI transcription originally consists of six parts - the audio recording of the utterance, the fundamental frequency contour and four parallel tiers for the transcription of tone sequence, ortographic transcription, indication of break indices between words and for additional observations. The core of the transcription, i. e. of the phonological analyses of the intonation pattern, is represented by the tone tier where tonal variation is transcribed by using labels for high tone and low tone where a tone can appear as a pitch accent, phrase accentand boundary tone. Due to its simplicity and flexibility, the system soon began to be used for the prosodic annotation of other variants of English and many other languages, as well as in different non-linguistic fields, leading to the creation of many new ToBI systems adapted to individual languages and dialects. The author is the first to use this method for Slovene, more precisely, for the intonational transcription and analysis of the corpus of spontaneous speech of Slovene Istria, in order to investigate if the ToBi system is useful for the annotation of Slovene and its regional variants.Članek predstavlja ToBI, transkripcijsko metodo za zapis prozodičnih dogodkov. ToBI je kratica za Tones and Breaks Indices, ki izvirno poimenuje intonacijski sistem, ki je bil razvit v 90-ih letih prejšnjega stoletja in zgrajen za označevanje intonacije in prozodije v podatkovni bazi govorjene ameriške angleščine (Mainstream American English). MAE_ToBI transkripcija po prvotnem dogovoru sestoji iz šestih delov - iz zvočnega posnetka izreka, zapisa poteka osnovne frekvence in štirih vzporedno poravnanih pasov, ki so namenjeni transkripciji tonskega poteka, ortografskemu zapisu izreka, označevanju jakosti mej med besedami ter zapisovanju dodatnih opazovanj. Jedro zapisa oziroma fonoloških analiz intonacijskega vzorca predstavlja tonski pas, v katerem z oznakami za visoki in nizki ton transkribiramo razlikovalna tonska nihanja. Sistem se je zaradi svoje enostavnosti in prilagodljivosti hitro razširil na prozodično označevanje ostalih variant angleščine in mnogih drugih jezikov ter na različna nelingvistična področja, nastali so številnih novih ToBI-sistemi, prilagojeni posameznim jezikom ali narečjem. Metoda je bila prvič uporabljena za zapis in analizo intonacije na korpusu spontanega govora govorcev v Slovenski Istri z namenom preizkusiti, v kolikšni meri je ToBI primeren za opis intonacije slovenskega jezika in njegovih pokrajinskih različic

Repository of the University of Ljubljana

Repository of University of Primorska

Classification of Types of Stuttering Symptoms Based on Brain Activity

Author: A Mestres-Misse
A Toyomura
A Ujihira
AL Foundas
AL Giraud
Angela Sirigu
AR Braun
B Haslinger
C Ecker
C Lu
C Lu
C Lu
CA Kell
Chaozhe Zhu
Chunming Lu
CZ Zhu
DA Handwerker
Danling Peng
DD Cox
DS Beal
DS Margulies
E Yairi
EG Conture
EP Paden
F De Martino
F Hoeft
F Van Opstal
FH Guenther
GD Riley
H Ackermann
H Kolk
I Guyon
I Molnar-Szakacs
J Doyon
J Doyon
J Hulvershorn
J Mourao-Miranda
J Xiong
JC Wu
JD Anderson
Jing Jiang
JL Preston
JR Binder
JS Yaruss
K Kadi-Hanifi
K Neumann
K Specht
KA Norman
KE Watkins
LF De Nil
LF De Nil
M Groussard
M Papoutsi
M Sommer
MD Cykowski
ME Wingate
ME Wingate
N Bernstein Ratner
O Bloodstein
P Hagoort
P Howell
P Howell
P Howell
P Howell
P Howell
P Indefrey
P Lieberman
Peter Howell
PL Strick
PT Fox
R Cunnington
R Kohavi
R Salmelin
RC Oldfield
RJ Ingham
RV Watkins
RW Cox
RW Cox
S Brown
S Kloppel
S LaConte
SD Forman
SE Chang
SE Chang
SE Chang
SG Costafreda
TT Schnur
W Johnson
Y Deng
Y Kikuchi
Publication venue: Public Library of Science
Publication date: 25/06/2012
Field of study

Among the non-fluencies seen in speech, some are more typical (MT) of stuttering speakers, whereas others are less typical (LT) and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT) whole-word repetitions (WWR) should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

MPG.PuRe