614 research outputs found

    Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features

    Get PDF
    Prominence perception has been known to correlate with a complex interplay of the acoustic features of energy, fundamental frequency, spectral tilt, and duration. The contribution and importance of each of these features in distinguishing between prominent and non-prominent units in speech is not always easy to determine, and more so, the prosodic representations that humans and automatic classifiers learn have been difficult to interpret. This work focuses on examining the acoustic prosodic representations that binary prominence classification neural networks and autoencoders learn for prominence. We investigate the complex features learned at different layers of the network as well as the 10-dimensional bottleneck features (BNFs), for the standard acoustic prosodic correlates of prominence separately and in combination. We analyze and visualize the BNFs obtained from the prominence classification neural networks as well as their network activations. The experiments are conducted on a corpus of Dutch continuous speech with manually annotated prominence labels. Our results show that the prosodic representations obtained from the BNFs and higher-dimensional non-BNFs provide good separation of the two prominence categories, with, however, different partitioning of the BNF space for the distinct features, and the best overall separation obtained for F0.Peer reviewe

    Prosodic boundary phenomena

    Get PDF
    Synopsis: In spoken language comprehension, the hearer is faced with a more or less continuous stream of auditory information. Prosodic cues, such as pitch movement, pre-boundary lengthening, and pauses, incrementally help to organize the incoming stream of information into prosodic phrases, which often coincide with syntactic units. Prosody is hence central to spoken language comprehension and some models assume that the speaker produces prosody in a consistent and hierarchical fashion. While there is manifold empirical evidence that prosodic boundary cues are reliably and robustly produced and effectively guide spoken sentence comprehension across different populations and languages, the underlying mechanisms and the nature of the prosody-syntax interface still have not been identified sufficiently. This is also reflected in the fact that most models on sentence processing completely lack prosodic information. This edited book volume is grounded in a workshop that was held in 2021 at the annual conference of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS). The five chapters cover selected topics on the production and comprehension of prosodic cues in various populations and languages, all focusing in particular on processing of prosody at structurally relevant prosodic boundaries. Specifically, the book comprises cross-linguistic evidence as well as evidence from non-native listeners, infants, adults, and elderly speakers, highlighting the important role of prosody in both language production and comprehension

    Universal and language-specific processing : the case of prosody

    Get PDF
    A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

    Individual Differences in Speech Production and Perception

    Get PDF
    Inter-individual variation in speech is a topic of increasing interest both in human sciences and speech technology. It can yield important insights into biological, cognitive, communicative, and social aspects of language. Written by specialists in psycholinguistics, phonetics, speech development, speech perception and speech technology, this volume presents experimental and modeling studies that provide the reader with a deep understanding of interspeaker variability and its role in speech processing, speech development, and interspeaker interactions. It discusses how theoretical models take into account individual behavior, explains why interspeaker variability enriches speech communication, and summarizes the limitations of the use of speaker information in forensics

    The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

    Get PDF
    Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

    Cross-Linguistic Perception and Learning of Japanese Lexical Prosody by English Listeners

    Get PDF
    xviii, 216 p. : ill. (some col.)The focus of this dissertation is on how language experience shapes perception of a non-native prosodic contrast. In Tokyo Japanese, fundamental frequency (F0) peak and fall are acoustic cues to lexically contrastive pitch patterns, in which a word may be accented on a particular syllable or unaccented (e.g., tsúru 'a crane', tsurú 'a vine', tsuru 'to fish'). In English, lexical stress is obligatory, and it may be reinforced by F0 in higher-level prosodic groupings. Here I investigate whether English listeners can attend to F0 peaks as well as falls in contrastive pitch patterns and whether training can facilitate the learning of prosodic categories. In a series of categorization and discrimination experiments, where F0 peak and fall were manipulated in one-word utterances, the judgments of prominence by naïve English listeners and native Japanese listeners were compared. The results indicated that while English listeners had phonetic sensitivity to F0 fall in a same-different discrimination task, they could not consistently use the F0 fall to categorize F0 patterns. The effects of F0 peak location and F0 fall on prominence judgments were always larger for Japanese listeners than for English listeners. Furthermore, the interaction between these acoustic cues affected perception of the contrast by Japanese, but not English, listeners. This result suggests that native, but not non-native, listeners have complex and integrated processing of these cues. The training experiment assessed improvement in categorization of Japanese pitch patterns with exposure and feedback. The results suggested that training improved identification of the accented patterns, which also generalized to new words and new contexts. Identification of the unaccented pattern, on the other hand, showed no improvement. Error analysis indicated that native English listeners did not learn to attend specifically to the lack of the F0 fall. To conclude, language experience influences perception of prosodic categories. Although there is some sensitivity to F0 fall in non-native listeners, they rely mostly on F0 peak location in language-like tasks such as categorization of pitch patterns. Learning of new prosodic categories is possible. However, not all categories are learned equally well, which suggests that first language attentional biases affect second language acquisition in the prosodic domain.Committee in charge: Susan Guion Anderson, Chairperson; Melissa A. Redford, Member; Vsevolod Kapatsinki, Member; Kaori Idemaru, Outside Membe

    The Development of Speech Rhythm and Fluency of Advanced English Learners: A Mixed Methods Study of the Correlation between Native Speaker Evaluations and Acoustic Measures

    Get PDF
    This thesis examined the changes in speech rhythm and fluency of advanced English learners during a pronunciation course. The research strived to answer the following research questions: ‘Do speech rhythm and fluency of advanced English learners change after a pronunciation course according to native speaker ratings?’, ‘if yes, which acoustic measures do these changes correlate with?’, and ‘what is the correlation between the perceived speech rhythm, fluency, accentedness, and comprehensibility?’. I approached these issues through mixed methods, both quantitative and qualitative. First, 20 advanced Finnish learners of English were selected out of 45 first-year major English students available. The number of the participants was limited keeping in mind the duration of the native-speaker questionnaire. The questionnaire consisted of background information questions and 42 speech samples to be rated on a 9-point Likert scale, cropped from the learner recordings as well as one native speaker. After collecting responses from 31 native speakers of English, I conducted a statistical analysis, whose results were then used for extreme case sampling. The speech of four learners with the biggest changes in their speech rhythm and fluency was then analyzed acoustically to find the contributing factors. The results showed both positive and negative changes on an individual level, but the differences were not statistically significant on a group level. The acoustic analysis demonstrated higher fluency scores correlating with faster articulation rate, smaller number of unfilled pauses, the location of pauses at phrase or clause boundaries, and fewer repairs. Rhythm measures revealed that pitch and amplitude peaks generally matched better and the use of durational cues as well as vowel reduction and linking increased in the posttest speech. All four rated aspects correlated significantly, particularly speech rhythm and fluency scores in both pre- and posttest samples (r = 0.98). Thus, these two can be said to be closely intertwined. Based on the results, speech rhythm should not be neglected in pronunciation instruction as it strongly influences the perceptions of fluency, accentedness, and comprehensibility. It is suggested that further research on rhythm focuses on its nature as both a perceived and produced phenomenon, as well as defining its relationship to fluency

    Acoustic correlates of encoded prosody in written conversation

    Get PDF
    This thesis presents an analysis of certain punctuation devices such as parenthesis, italics and emphatic spellings with respect to their acoustic correlates in read speech. The class of punctuation devices under investigation are referred to as prosodic markers. The thesis therefore presents an analysis of features of the spoken language which are represented symbolically in text. Hence it is a characterization of aspects of the spoken language which have been transcribed or symbolized in the written medium and then translated back into a spoken form by a reader. The thesis focuses in particular on the analysis of parenthesis, the examination of encoded prominence and emphasis, and also addresses the use of paralinguistic markers which signal attitude or emotion.In an effort to avoid the use of self constructed or artificial material containing arbitrary symbolic or prosodic encodings, all material used for empirical analysis was taken from examples of electronic written exchanges on the Internet, such as from electronic mail messages and from articles posted on electronic newsgroups and news bulletins. This medium of language, which is referred to here as written conversation, provides a rich source of material containing encoded prosodic markers. These occur in the form of 'smiley faces' expressing attitudes or feelings, words highlighted by a number of means such as capitalization, italics, underscore characters, or asterisks, and in the form of dashes or parentheses, which provide suggestions on how the information in a text or sentence may be structured with regard to its informational content.Chapter 2 investigates in detail the genre of written conversation with respect to its place in an emerging continuum between written and spoken language, concentrating on transcriptional devices and their function as indicators of prosody. The implications these symbolic representations bear on the task of reading, by humans as well as machines, are then examined.Chapters 3 and 4 turn to the acoustic analysis of parentheticals and emphasis markers respectively. The experimental work in this thesis is based on readings of a corpus of selected materials from written conversation with the acoustic analysis concentrating on the differences between readings of texts with prosodic markers and readings of the same texts from which prosodic markers have been removed. Finally, the effect of prosodic markers is tested in perception experiments involving both human and resynthesized utterances
    corecore