thesis

A computational memory and processing model for prosody

Abstract

Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts & Sciences, 1999.Includes bibliographical references (p. 209-226).This thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based information through a model of attention and working memory. The main simulation parameter of the memory model quantifies recall. Changing its value changes what counts as given and new information in a text, and therefore determines the intonation with which the text is uttered. Other aspects of search and storage in the memory model are mapped to the remainder of the continuous and categorical features of pitch and timing, producing prosody in three different styles: for small recall values, the exaggerated and sing-song melodies of children's speech; for mid-range values, an adult expressive style; for the largest values, the prosody of a speaker who is familiar with the text, and at times sounds bored or irritated. In addition, because the storage procedure is stochastic, the prosody from simulation to simulation varies, even for identical control parameters. As with with human speech, no two renditions are alike. Informal feedback indicates that the stylistic differences are recognizable and that the prosody is improved over current offerings. A comparison with natural data shows clear and predictable trends although not at significance. However, a comparison within the natural data also did not produce results at significance. One practical contribution of this work is a text mark-up schema consisting of relational annotations to grammatical structures. Another is the product - varied and plausible prosody in synthesized speech. The main theoretical contribution is to show that resource-bound cognitive activity has prosodic correlates, thus providing a rationale for the individual and stylistic differences in melody and rhythm that are ubiquitous in human speech.by Janet Elizabeth Cahn.Ph.D

    Similar works