65 research outputs found
Recommended from our members
Empirical Studies on the Disambiguation of Cue Phrases
Cue phrases are linguistic expressions such as now and well that function as explicit indicators of the structure of a discourse. For example, now may signal the beginning of a subtopic or a return to a previous topic, while well may mark subsequent material as a response to prior material, or as an explanatory comment. However, while cue phrases may convey discourse structure, each also has one or more alternate uses. While incidentally may be used sententially as an adverbial, for example, the discourse use initiates a digression. Although distinguishing discourse and sentential uses of cue phrases is critical to the interpretation and generation of discourse, the question of how speakers and hearers accomplish this disambiguation is rarely addressed. This paper reports results of empirical studies on discourse and sentential uses of cue phrases, in which both text-based and prosodic features were examined for disambiguating power. Based on these studies, it is proposed that discourse versus sentential usage may be distinguished by intonational features, specifically, pitch accent and prosodic phrasing. A prosodic model that characterizes these distinctions is identified. This model is associated with features identifiable from text analysis, including orthography and part of speech, to permit the application of the results of the prosodic analysis to the generation of appropriate intonational features for discourse and sentential uses of cue phrases in synthetic speech
Recommended from our members
On the Correlation between Energy and Pitch Accent in Read English Speech
In this paper, we describe a set of experiments that examine the correlation between energy and pitch accent. We tested the discriminative power of the energy component of frequency sub- bands with a variety of frequencies and bandwidths on read speech spoken by four native speakers of Standard American English, us- ing an analysis by classification approach. We found that the frequency region most robust to speaker differences is between 2 and 20 bark. Across all speakers, using only energy features we were able to predict pitch accent in read speech with accuracy of 81.9%
Recommended from our members
A Framework for Eliciting Emotional Speech: Capitalizing on the Actorās Process
This paper offers an approach and a theoretical framework for eliciting emotional speech using actors. The framework is developed by connecting the goal-based model of emotion proposed by Abelson [1], the work of appraisal theorists, and an approach to the actor's technical process widely used in the professional theater and taught in modern conservatories. In doing so, we hope to address some of the difficulties currently encountered in the use of acted speech in emotion research
Recommended from our members
Using Prosody and Phonotactics in Arabic Dialect Identiļ¬cation
While Modern Standard Arabic is the formal spoken and written language of the Arab world, dialects are the major communication mode for everyday life; identifying a speakerās dialect is thus critical to speech processing tasks such as automatic speech recognition, as well as speaker identification We examine the role of prosodic features (intonation and rhythm) across four Arabic dialects: Gulf, Iraqi, Levantine, and Egyptian, for the purpose of automatic dialect identification We show that prosodic features can significantly improve identification, over a purely phonotactic-based approach, with an identification accuracy of 86.33% for 2m utterances
Recommended from our members
The Meaning of Intonational Contours in the Interpretation of Discourse
Recent investigations of the contribution that intonation makes to overall utterance and discourse interpretation promise new sources of information for the investigation of long-time concerns in NLP. In Hirschberg & Pierrehumber 1986 we proposed that intonational features such as phrasing, accent placement, pitch range, and tune represent important sources of information about the attentional and intentional structures of discourse. In this paper we examine the particular contribution of choice of tune, or intonational contour, to discourse interpretation
Recommended from our members
Acoustic/Prosodic and Lexical Correlates of Charismatic Speech
Charisma, the ability to command authority on the basis of personal qualities, is more difcult to dene than to identify. How do charismatic leaders such as Fidel Castro or Pope John Paul II attract and retain their followers? We present results of an analysis of subjective ratings of charisma from a corpus of American political speech. We identify the associations be- tween charisma ratings and ratings of other personal attributes. We also examine acoustic/prosodic and lexical features of this speech and correlate these with charisma ratings
Recommended from our members
Story Segmentation of Broadcast News in English, Mandarin and Arabic
In this paper, we present results from a Broadcast News story segmentation system developed for the SRI NIGHTINGALE system operating on English, Arabic and Mandarin news shows to provide input to subsequent question-answering processes. Using a rule-induction algorithm with automatically extracted acoustic and lexical features, we report success rates that are competitive with state-of-the-art systems on each input language. We further demonstrate that features useful for English and Mandarin are not discriminative for Arabic
V-Measure: A conditional entropy-based external cluster evaluation
We present V-measure, an external entropy-based cluster evaluation measure. Vmeasure provides an elegant solution to many problems that affect previously defined cluster evaluation measures including 1) dependence on clustering algorithm or data set, 2) the āproblem of matchingā, where the clustering of only a portion of data points are evaluated and 3) accurate evaluation and combination of two desirable aspects of clustering, homogeneity and completeness. We compare V-measure to a number of popular cluster evaluation measures and demonstrate that it satisfies several desirable properties of clustering solutions, using simulated clustering results. Finally, we use V-measure to evaluate two clustering tasks: document clustering and pitch accent type clustering
Recommended from our members
Production of English Prominence by Native Mandarin Chinese Speakers
Native-like production of intonational prominence is important for spoken language competency. Non-native speakers may have trouble producing prosodic variation in a second language (L2) and thus, problems in being understood. By identifying common sources of production error, we will be able to aid in the instruction of L2 speakers. In this paper we present results of a production study designed to test the ability of Mandarin L1 speakers to produce prominence in English. Our results show that there are some consistent differences between the L1 and L2 speakers in the use of pitch to indicate prominence, as well as in the accenting of phrase-initial tokens. We also find that we can automatically detect prominence on Mandarin L1 English with 87.23% and an f-measure of 0.866 if we train a classifier with annotated Mandarin L1 English data. Models trained on native English speech can detect prominence in Mandarin L1 English with an accuracy of 74.77% and f-measure of 0.824
Recommended from our members
Prosodic Predictors of Upcoming Positive or Negative Content in Spoken Messages
This article examines potential prosodic predictors of emotional speech in utterances perceived as conveying that good or bad news is about to be delivered. Speakers were asked to call an experimental confederate to inform her about whether or not she had been given a job she had applied for. A perception study was then performed in which initial fragments of the recorded utterances, not containing any explicit lexical cues to emotional content, were presented to listeners who had to rate whether good or bad news would follow the utterance. The utterances were then examined to discover acoustic and prosodic features that distinguished between good and bad news. It was found that speakers in the production study were not simply reflecting their own positive or negative mood during the experiment, but rather appeared to be influenced by the valence of the positive or negative message they were preparing to deliver. Positive and negative utterances appeared to be judged differently with respect to a number of perceived attributes of the speakersā voices (like sounding hesitant or nervous). These attributes correlated with a number of automatically obtained acoustic features
- ā¦