485 research outputs found

    An investigation of speaker independent phrase break models in End-to-End TTS systems

    Full text link
    This paper presents our work on phrase break prediction in the context of end-to-end TTS systems, motivated by the following questions: (i) Is there any utility in incorporating an explicit phrasing model in an end-to-end TTS system?, and (ii) How do you evaluate the effectiveness of a phrasing model in an end-to-end TTS system? In particular, the utility and effectiveness of phrase break prediction models are evaluated in in the context of childrens story synthesis, using listener comprehension. We show by means of perceptual listening evaluations that there is a clear preference for stories synthesized after predicting the location of phrase breaks using a trained phrasing model, over stories directly synthesized without predicting the location of phrase breaks.Comment: Submitted for review to IEEE Acces

    Studies in the linguistic sciences. 17-18 (1987-1988)

    Get PDF

    Metalinguistic awareness in literate and illiterate children and adults: a psycholinguistic study

    Get PDF
    One of the major goals of psycholinguistic research is to be able to account for those mental operations which enable native speakers not only to perform the basic linguistic capacities such as comprehending and producing an illimited number of utterances, but also to exercise such metalinguistic abilities as to judge utterances, segment words, identify sounds and detect ambiguities. The primary concern of this thesis was to elucidate the processes underlying certain aspects of metalinguistic awareness and to trace their relationship to advances in maturation and acquisition of literacy. The guiding principle has been to determine how much of what has been considered normal cognitive development is in fact an age-bound developmental phenomenon, or to what extent it reflects the result of experiences associated with the degree and extent of literacy. The need for this is apparent on examining previous research which, as we demonstrate, has confounded such theoretically important variables as Age, Literacy and peculiarities of the native language. The aim of the methodology employed here was to deconf ound such variables and add more insight as to the nature of metalinguistic abilities. First, by employing literate and illiterate children and adults, the design optimizes the likelihood of tapping a precise relationship between maturation, literacy and metalinguistic awareness. Second, by using native speakers of Arabic, the general design offers the opportunity to add insight from language yet another typologically different from English in which most previous research was conducted. Third, by employing more than one type of linguistic measure for the same population, the design also hopes to answer one empirical question, namely',, whether metalinguistic awareness can be conceptualised as either multidimensional or unitary in nature. The Subjects who participated in the study were 120 Moroccan Arabic speaking literate and illiterate children and adults drawn from a relatively homogeneous socio-economic background. A total of seven experiments -- some with subtasks -- were used. Six chapters make up the study. In Chapter 1 we have tried to provide an introduction to the theoretical issues which we think are of central importance to the topic under investigation. Because our approach is essentially psycholinguistic, Chapter 2 describes and discusses the methodology employed to gather the necessary data for the study. It is also concerned with the procedures used to evaluate these data. Chapters 3,4, and 5 form the main bulk of the research. Using various experiments, they examine the extent to which Ss deploy their metalinguistic knowledge in the process of attending to and manipulating the following linguistic units: (i) words (Chapter 3); (ii) syllables (Chapter 4); (iii) segments (Chapter 5). Typically, each one of these chapters considers various hypotheses and research questions which concern the specific linguistic unit. Finally, Chapter 6 draws general conclusions from the general study and addresses some implications for linguistic theory, psycholinguistic research and, although not extensively, education research

    Unsupervised learning for text-to-speech synthesis

    Get PDF
    This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented

    Function of intonation in task-oriented dialogue

    Get PDF
    This thesis addresses the question of how intonation functions in conversation. It examines the intonation and discourse function of single-word utterances in spontaneous and read-aloud task-oriented dialogue (HCRC Map Task Corpus containing Scottish English; see Anderson et al., 1991). To avoid some of the pitfalls of previous studies in which such comparisons of intonation and discourse structure tend to lack balance and focus more heavily on one analysis at the expense of the other, it employs independently developed analyses. They are the Conversational Games Analysis (as introduced in Kowtko, Isard and Doherty, 1992) and a simple target level representation of intonation. Correlations between categories of intonation and of discourse function in spontaneous dialogue suggest that intonation reflects the function of an utterance. Contrary to what one might expect from reading the literature, these categories are in some cases categories of exclusion rather than inclusion. Similar patterns result from the study of read-aloud dialogue. Discourse function and intonation categories show a measure of correlation. One difference that does appear between patterns across speech modes is that in many instances of discourse function intonation categories shift toward tunes ending low in the speaker's pitch range (e. g. a falling tune) for the read-aloud version. This result is in accord with other contemporary studies (e. g. Blaauw, 1995). The difference between spontaneous and read results suggests that read-aloud dialogue - even that based on scripts which include hesitations and false starts - is not a substitute for eliciting the same intonation strategies that are found in spontaneous dialogue

    Aspects of prosody in English and Swahili

    Get PDF

    Austronesian and other languages of the Pacific and South-east Asia : an annotated catalogue of theses and dissertations

    Get PDF

    Prosody and speech perception

    Get PDF
    The major concern of this thesis is with models of speech perception. Following Gibson's (1966) work on visual perception, it seeks to establish whether there are sources of information in the speech signal which can be responded to directly and which specify the units of information of speech. The treatment of intonation follows that of Halliday (1967) and rhythm that of Abercrombie (1967) . By "prosody" is taken to mean both the intonational and the rhythmic aspects of speech.Experiments one to four show the interdependence of prosody and grammar in the perception of speech, although they leave open the question of which sort of information is responded to first. Experiments five and six, employing a short-term memory paradigm and Morton's (1970) "suffix effect" explanation, demonstrate that prosody could well be responded to before grammar. Since the previous experiments suggested a close connection between the two, these results suggest that information about grammatical structures may well be given directly by prosody. In qthe final two experiments the amount of prosodic information in fluent speech that can be perceived independently of grammar and meaning is investigated. Although tone -group division seems to be given clearly enough by acoustic cues, there are problems of interpretation with the data on syllable stress assignments.In the concluding chapter, a three-stage model of speech perception is proposed, following never (1970), but incorporating prosodic analysis as an integral part of the processing. The obtained experimental results are integrated within this model
    corecore