248 research outputs found

    Advances in the neurocognition of music and language

    Get PDF

    Identifying prosodic prominence patterns for English text-to-speech synthesis

    Get PDF
    This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents

    Quantifying Speech Rhythms: Perception and Production Data in the Case of Spanish, Portuguese, and English

    Get PDF
    This dissertation addresses the methodology used in classifying speech rhythms in order to resolve a long-standing linguistic conundrum about whether languages differ rhythmically. There is a widespread perception, both among linguists and the general population, that some languages are stress-timed and others are syllable timed. Stress-timed languages are described as having less-regular rhythms, as syllable durations vary according to the placement of stress in the phrase. Meanwhile, syllable-timed languages are described as displaying less variation in rhythm, which syllable durations being more regular. This dissertation quantitatively evaluates these described rhythmic differences in Spanish,Portuguese, and English. The first chapter introduces speech rhythms and reviews past literature on their perception and production. The second chapter evaluates a widely used metric of speech rhythms, the PVI, and determines that it is not effective in distinguishing between two dialects of Spanish. The third chapter compares the speech rhythms of Mexican and Chicano Spanish. This chapter concludes that Chicano Spanish is more restricted in its vowel duration variability, while Mexican Spanish employs both highly variable durations (i.e. stress-timed) and highly uniform durations (i.e. syllable-timed). The fourth chapter describes a perception study used to compare the speech rhythms of Spanish, English, and Portuguese, and shows that these languages' rhythms do not always group according to language. In the fifth chapter, I describe a study of the production of the same utterances initially used in the perception experiment; this allows an analysis of what prompts the perceptual differences in speech rhythm described in Chapter Four. The sixth and final chapter discusses the implications and applications of these findings and gives direction for further investigation. Although both production and perception studies of speech rhythms have been performed in the past, my dissertation expands these methodologies by combining production and perception data is a single analysis. I use perception data to relatively classify the rhythms of utterances through low-pass speech filtering, then analyze the production of these data computationally to provide a more complete perspective of what prompts differences in speech-rhythms and how Spanish, Portuguese, and English data relate rhythmically. Thus, my dissertation is thorough, while still addressing traditional rhythm metrics and employing current computational methodology. It seeks to challenge linguists' methodologies in quantitatively addressing speech rhythms, and to further clarify the position of Spanish, Portuguese, and English on the speech rhythm continuum
    • …
    corecore