151 research outputs found

    Text-based and Signal-based Prediction of Break Indices and Pause Durations

    Get PDF
    The relation between symbolic and signal features of prosodic boundaries is experimentally studied using prediction methods. Text-based break index prediction turns out to be fairly good, but signal-based prediction and pause duration prediction perform worse. A possible reason is that random signal feature variations, as usually produced by humans, are hard to predict

    Automatisation of intonation modelling and its linguistic anchoring

    Get PDF
    This paper presents a fully machine-driven approach for intonation description and its linguistic interpretation. For this purpose,a new intonation model for bottom-up F0 contour analysis and synthesis is introduced, the CoPaSul model which is designed in the tradition of parametric, contour-based, and superpositional approaches. Intonation is represented by a superposition of global and local contour classes that are derived from F0 parameterisation. These classes were linguistically anchored with respect to information status by aligning them with a text which had been coarsely analysed for this purpose by means of NLP techniques. To test the adequacy of this data-driven interpretation a perception experiment was carried out, which confirmed 80% of the findings

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Analyzing Prosody with Legendre Polynomial Coefficients

    Full text link
    This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis

    Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

    Get PDF
    We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

    Acoustic profiles for prosodic headedness and constituency

    Get PDF

    Form and Function of Connectives in Chinese Conversational Speech

    Get PDF
    Connectives convey discourse functions that provide textual and pragmatic information in speech communication on top of canonical, sentential use. This paper proposes an applicable scheme with illustrative examples for distinguishing Sentential, Conclusion, Disfluency, Elaboration, and Resumption uses of Mandarin connectives, including conjunctions and adverbs. Quantitative results of our annotation works are presented to gain an overview of connectives in a Mandarin conversational speech corpus. A fine-grained taxonomy is also discussed, but it requires more empirical data to approve the applicability. By conducting a multinomial logistic regression model, we illustrate that connectives exhibit consistent patterns in positional, phonetic, and contextual features oriented to the associated discourse functions. Our results confirm that the position of Conclusion and Resumption connectives orient more to positions in semantically, rather than prosodically, determined units. We also found that connectives used for all four discourse functions tend to have a higher initial F0 value than those of sentential use. Resumption and Disfluency uses are expected to have the largest increase in initial F0 value, followed by Conclusion and Elaboration uses. Durational cues of the preceding context enable distinguishing Sentential use from discourse uses of Conclusion, Elaboration, and Resumption of connectives
    • …
    corecore