3 research outputs found
Recommended from our members
Inferring Speaker Affect in Spoken Natural Language Communication
The ļ¬eld of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards machine listeningāinterpreting the full spectrum of information in an audio stream. One part of machine listening, the problem that this thesis focuses on, is the task of using information in the speech signal to infer a personās emotional or mental state. In this dissertation, our approach is to assess the utility of prosody, or manner of speaking, in classifying speaker affect. Prosody refers to the acoustic features of natural speech: rhythm, stress, intonation, and energy. Affect refers to a personās emotions and attitudes such as happiness, frustration, or uncertainty. We focus on one specific dimension of affect: level of certainty. Our goal is to automatically infer whether a person is conļ¬dent or uncertain based on the prosody of his or her speech. Potential applications include conversational dialogue systems (e.g., in educational technology) and voice search (e.g., smartphone personal assistants). There are three main contributions of this thesis. The first contribution is a method for eliciting uncertain speech that binds a speakerās uncertainty to a single phrase within the larger utterance, allowing us to compare the utility of contextually-based prosodic features. Second, we devise a technique for computing prosodic features from utterance segments that both improves uncertainty classification and can be used to determine which phrase a speaker is uncertain about. The level of certainty classifier achieves an accuracy of 75%. Third, we examine the differences between perceived, self-reported, and internal level of certainty, concluding that perceived certainty is aligned with internal certainty for some but not all speakers and that self-reports are a good proxy for internal certainty.Engineering and Applied Science
Recommended from our members
The Importance of Sub-Utterance Prosody in Predicting Level of Certainty
We present an experiment aimed at understanding how to optimally use acoustic and prosodic information to predict a speaker's level of certainty. With a corpus of utterances where we can isolate a single word or phrase that is responsible for the speaker's level of certainty we use different sets of sub-utterance prosodic features to train models for predicting an utterance's perceived level of certainty. Our results suggest that using prosodic features of the word or phrase responsible for the level of certainty and of its surrounding context improves the prediction accuracy without increasing the total number of features when compared to using only features taken from the utterance as a whole.Engineering and Applied Science
Recommended from our members
Identifying Uncertain Words within an Utterance via Prosodic Features
We describe an experiment that investigates whether sub-utterance prosodic features can be used to detect uncertainty at the wordlevel. That is, given an utterance that is classified as uncertain, we want to determine which word or phrase the speaker is uncertain about. We have a corpus of utterances spoken under varying degrees of certainty. Using combinations of sub-utterance prosodic features we train models to predict the level of certainty of an utterance. On a set of utterances that were perceived to be uncertain, we compare the predictions of our models for two candidate target word segmentations: (a) one with the actual word causing uncertainty as the proposed target word, and (b) one with a control word as the proposed target word. Our best model correctly identifies the word causing the uncertainty rather than the control word 91% of the time.Engineering and Applied Science