2,070 research outputs found

    Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

    Get PDF
    Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

    Differential contribution of prosodic cues in the native and non-native segmentation of French speech

    Get PDF
    This is the published version, also available here: http://dx.doi.org/10.1515/lp-2012-0018.This study investigates the use of prosodic information in the segmentation of French speech by mid-level and high-level English second/foreign language (L2) learners of French and native French listeners. The results of two word-monitoring tasks, one with natural stimuli and one with resynthesized stimuli, show that as L2 learners become more proficient in French, they go from parsing accented syllables as word-initial to parsing them as word-final, but unlike native listeners, they use duration increase but not fundamental frequencyx (F0) rise as a cue to word-final boundaries. These results are attributed to: (1) the L2 learners' native language, in which F0 rise is a reliable cue to word-initial boundaries but not word-final boundaries; (2) the co-occurrence of F0 and duration cues in word-final syllables in French, rendering L2 learners' use of F0 rise unnecessary for locating word-final boundaries; and (3) the optional marking of word-initial boundaries by F0 cues in French, thus making it difficult for non-native listeners to tease the two types of F0 rise apart. We argue that these factors prevent English listeners from attending to F0 rise as a cue to word-final boundaries in French, irrespective of their proficiency in French

    Exploring complex vowels as phrase break correlates in a corpus of English speech with ProPOSEL, a prosody and POS English lexicon

    Get PDF
    Real-world knowledge of syntax is seen as integral to the machine learning task of phrase break prediction but there is a deficiency of a priori knowledge of prosody in both rule-based and data-driven classifiers. Speech recognition has established that pauses affect vowel duration in preceding words. Based on the observation that complex vowels occur at rhythmic junctures in poetry, we run significance tests on a sample of transcribed, contemporary British English speech and find a statistically significant correlation between complex vowels and phrase breaks. The experiment depends on automatic text annotation via ProPOSEL, a prosody and part-of-speech English lexicon. Copyright © 2009 ISCA

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

    Get PDF
    The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications

    Extending AuToBI to prominence detection in European Portuguese

    Get PDF
    This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.info:eu-repo/semantics/publishedVersio
    • …
    corecore