88 research outputs found

    Can you believe that? The prosody of non-genuine polar questions in English

    Get PDF
    This thesis deals with the general question, how and to which degree functional-pragmatic aspects of human language are reflected in speech prosody. In an exploratory corpus study I analysed and compared different kinds of non-genuine polar questions in English. Polar questions are such questions, that typically can be answered by yes or no. Non-genuine polar questions, however, do not primarily seek an answer at all, but perform other conversational functions. The kinds of non-genuine polar questions under investigation can be described as requests for action, rhetorical questions and topic introductions. Ca. 100 instances of each of these three utterance types were collected from American television programmes and analysed using ProPer (Albert et al. 2023), a method, that takes into account both fundamental frequency and periodic energy. While there is no one-to-one correspondance between discourse function and prosodic form, small, but significant differences between the utterance types could be observed. These differences, however, are differences in degree, rather than categorical in nature

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    VOICE BASED FOR BANKING SYSTEM

    Get PDF
    The trouble with traditional banking system service resulted difficulties, latency and low quality of service, not suitable for disable people and require extra manpower to perform simple bank activities. The goal of this project is to build a voice recognition based system which specifies on the banking activities element and specializes in using voice as a medium to run bank activities via telephony network system. Three fundamental objectives were addressed in the study. First, to develop two-way interactive program of banking system, which use voice as importantmechanism to receive instruction and response to user. Second, it support to first objective which to develop such a user friendly andhighsecurity voice banking system which requires the user first logs on to the system by furnishing the assigned customer identification number and personal identification number before user proceed for further actions. And therefore, there must have a strong database structure development of the application in the voice banking system that purposely to maintain the integrity of the data stored and responds to authorized user only. For third objective, is to determine the best programming in order to implement in telephony network system. There is a study and architecture on how voice can be accepted, manipulated and generated by using combination two types of programming which are Cold Fusion and VoiceXML, which is goes to the third objective. The functions of this system is proved and demanded by user as it provides such convenience and easy services with just use voice to transmit the instruction. Hence, this strategy will grab large number of customers and simultaneously will generate huge profit too to the bank institution that applies this system. It is hoping that, by developing this system it will be a platform for next developer to host the system and can be use a large number of customers simultaneously and efficiently. Keyword: Voice based, telephony, combination of programming, architectur

    A Romance language perspective

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    Intonation in Language Acquisition - Evidence from German

    Get PDF
    This dissertation studies the role of intonation in language acquisition. After a general introduction about the phonetic and phonological aspects of intonation and its different forms and functions within language, two different models of language acquisition and the role of intonation within these two models will be presented. Following this, I will present and discuss empirical data on the question, whether young German learning children use intonation in order to acquire language. Two comprehension studies will be presented. Here, I concentrate on the question whether children understand the referential function of intonation and whether they can use this knowledge in order to learn new words. Additionally, I will present empirical evidence that focuses on the question whether children use intonation in resolving participant roles in complex syntactic constructions as well as in resolving syntactic ambiguities development. Finally, I will present two production studies that investigate the prosodic realization of target referents that have different informational statuses within a discourse from both young children and parents, talking to their children. Overall, the data from these studies suggest that language learning children do use the intonational form of an utterance from early on in order to understand another´s intention. Young language learning children do understand that a certain intonational form conveys a function. Additionally, the studies presented in this thesis suggest that children also use intonation in order to convey their own communicative intentions. Thus, intonation is an important instrument for young children‘s language acquisition as they use the information that is provided by intonation, not only to learn words and to combine them to syntactic constructions, but also for the understanding of paralinguistic properties of language. The findings of the studies presented in this thesis are discussed with regard to different theories of language acquisition. Additionally, I will give insight into the understanding of the development of young children´s use of intonation

    Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis

    Get PDF
    Statistical parametric speech synthesis (SPSS) has seen improvements over recent years, especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it can also be bland and monotonous. Proper generation of natural speech prosody is still a largely unsolved problem. This is relevant especially in the context of expressive audiobook speech synthesis, where speech is expected to be fluid and captivating. In general, prosody can be seen as a layer that is superimposed on the segmental (phone) sequence. Listeners can perceive the same melody or rhythm in different utterances, and the same segmental sequence can be uttered with a different prosodic layer to convey a different message. For this reason, prosody is commonly accepted to be inherently suprasegmental. It is governed by longer units within the utterance (e.g. syllables, words, phrases) and beyond the utterance (e.g. discourse). However, common techniques for the modeling of speech prosody - and speech in general - operate mainly on very short intervals, either at the state or frame level, in both hidden Markov model (HMM) and deep neural network (DNN) based speech synthesis. This thesis presents contributions supporting the claim that stronger representations of suprasegmental variation are essential for the natural generation of fundamental frequency for statistical parametric speech synthesis. We conceptualize the problem by dividing it into three sub-problems: (1) representations of acoustic signals, (2) representations of linguistic contexts, and (3) the mapping of one representation to another. The contributions of this thesis provide novel methods and insights relating to these three sub-problems. In terms of sub-problem 1, we propose a multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform, as well as a wavelet-based decomposition strategy that is linguistically and perceptually motivated. In terms of sub-problem 2, we investigate additional linguistic features such as text-derived word embeddings and syllable bag-of-phones and we propose a novel method for learning word vector representations based on acoustic counts. Finally, considering sub-problem 3, insights are given regarding hierarchical models such as parallel and cascaded deep neural networks

    A computational memory and processing model for prosody

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts & Sciences, 1999.Includes bibliographical references (p. 209-226).This thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based information through a model of attention and working memory. The main simulation parameter of the memory model quantifies recall. Changing its value changes what counts as given and new information in a text, and therefore determines the intonation with which the text is uttered. Other aspects of search and storage in the memory model are mapped to the remainder of the continuous and categorical features of pitch and timing, producing prosody in three different styles: for small recall values, the exaggerated and sing-song melodies of children's speech; for mid-range values, an adult expressive style; for the largest values, the prosody of a speaker who is familiar with the text, and at times sounds bored or irritated. In addition, because the storage procedure is stochastic, the prosody from simulation to simulation varies, even for identical control parameters. As with with human speech, no two renditions are alike. Informal feedback indicates that the stylistic differences are recognizable and that the prosody is improved over current offerings. A comparison with natural data shows clear and predictable trends although not at significance. However, a comparison within the natural data also did not produce results at significance. One practical contribution of this work is a text mark-up schema consisting of relational annotations to grammatical structures. Another is the product - varied and plausible prosody in synthesized speech. The main theoretical contribution is to show that resource-bound cognitive activity has prosodic correlates, thus providing a rationale for the individual and stylistic differences in melody and rhythm that are ubiquitous in human speech.by Janet Elizabeth Cahn.Ph.D

    On marked declaratives, exclamatives, and discourse particles in Castilian Spanish

    Get PDF
    This book provides a new perspective on prosodically marked declaratives, wh-exclamatives, and discourse particles in the Madrid variety of Spanish. It argues that some marked forms differ from unmarked forms in that they encode modal evaluations of the at-issue meaning. Two epistemic evaluations that can be shown to be encoded by intonation in Spanish are linguistically encoded surprise, or mirativity, and obviousness. An empirical investigation via an audio-enhanced production experiment finds that mirativity and obviousness are associated with distinct intonational features under constant focus scope, with stances of (dis)agreement showing an impact on obvious declaratives. Wh-exclamatives are found not to differ significantly in intonational marking from neutral declaratives, showing that they need not be miratives. Moreover, we find that intonational marking on different discourse particles in natural dialogue correlates with their meaning contribution without being fully determined by it. In part, these findings quantitatively confirm previous qualitative findings on the meaning of intonational configurations in Madrid Spanish. But they also add new insights on the role intonation plays in the negotiation of commitments and expectations between interlocutors
    • …
    corecore