Search CORE

1,228 research outputs found

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

ヒトニトッテシゼンナオンキョウショリニカンスルケンキュウ

Author: Tachibana Ryuki
タチバナリュウキ
立花隆輝
Publication venue
Publication date
Field of study

Osaka University Knowledge Archive

Automatic sentence stress feedback for non-native English learners

Author: Byeongchang Kim
Ho-Young Lee
Hyosung Hwang
Jieun Song
Jinsik Lee
Lee GG
Sechun Kang
Publication venue: 'Elsevier BV'
Publication date
Field of study

1121Ysciescopu

Elsevier - Publisher Connector

포항공과대학교

A development of Thai prosodically enriched speech corpus

Author: Hansakunbuntheung Chatchawarn
Sagisaka Yoshinori
Publication venue: 早稲田大学大学院国際情報通信研究科国際情報通信研究センター
Publication date: 31/07/2009
Field of study

Waseda University Repository

Hierarchical Representation and Estimation of Prosody using Continuous Wavelet Transform

Author: Aalto Daniel
Simko Juraj
Suni Antti
Vainio Martti
Publication venue
Publication date: 01/09/2017
Field of study

Prominences and boundaries are the essential constituents of prosodic struc- ture in speech. They provide for means to chunk the speech stream into linguis- tically relevant units by providing them with relative saliences and demarcating them within utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech syn- thesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic promi- nences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.Peer reviewe

Helsingin yliopiston digitaalinen arkisto