Search CORE

2,070 research outputs found

Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

Author: Johnson Michael T
Liu Jia
Xia Shanhong
Yang Hua
Zhang Wei-Qiang
Zhao Junhong
Publication venue: e-Publications@Marquette
Publication date: 01/12/2013
Field of study

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

epublications@Marquette

Springer - Publisher Connector

Differential contribution of prosodic cues in the native and non-native segmentation of French speech

Author: Bahler Carly
Coughlin Caitlin E.
Gaillard Stephanie
Tremblay Annie
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 14/04/2015
Field of study

This is the published version, also available here: http://dx.doi.org/10.1515/lp-2012-0018.This study investigates the use of prosodic information in the segmentation of French speech by mid-level and high-level English second/foreign language (L2) learners of French and native French listeners. The results of two word-monitoring tasks, one with natural stimuli and one with resynthesized stimuli, show that as L2 learners become more proficient in French, they go from parsing accented syllables as word-initial to parsing them as word-final, but unlike native listeners, they use duration increase but not fundamental frequencyx (F0) rise as a cue to word-final boundaries. These results are attributed to: (1) the L2 learners' native language, in which F0 rise is a reliable cue to word-initial boundaries but not word-final boundaries; (2) the co-occurrence of F0 and duration cues in word-final syllables in French, rendering L2 learners' use of F0 rise unnecessary for locating word-final boundaries; and (3) the optional marking of word-initial boundaries by F0 cues in French, thus making it difficult for non-native listeners to tease the two types of F0 rise apart. We argue that these factors prevent English listeners from attending to F0 rise as a cue to word-final boundaries in French, irrespective of their proficiency in French

KU ScholarWorks

Exploring complex vowels as phrase break correlates in a corpus of English speech with ProPOSEL, a prosody and POS English lexicon

Author: Atwell E
Brierley C
Publication venue
Publication date: 01/01/2009
Field of study

Real-world knowledge of syntax is seen as integral to the machine learning task of phrase break prediction but there is a deficiency of a priori knowledge of prosody in both rule-based and data-driven classifiers. Speech recognition has established that pauses affect vowel duration in preceding words. Based on the observation that complex vowels occur at rhythmic junctures in poetry, we run significance tests on a sample of transcribed, contemporary British English speech and find a statistically significant correlation between complex vowels and phrase breaks. The experiment depends on automatic text annotation via ProPOSEL, a prosody and part-of-speech English lexicon. Copyright © 2009 ISCA

White Rose Research Online

Leeds Beckett Repository

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

Author: Kolli Chandra Sekhar Rao
Publication venue: University of Memphis Digital Commons
Publication date: 19/07/2012
Field of study

The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications

University of Memphis Digital Commons

Extending AuToBI to prominence detection in European Portuguese

Author: Batista Fernando
Hirschberg Julia
Mata Ana Isabel
Moniz Helena
Rosenberg Andrew
Trancoso Isabel
Publication venue: Urbana, IL
Publication date: 01/01/2014
Field of study

This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.info:eu-repo/semantics/publishedVersio

Universidade de Lisboa: Repositório.UL