7 research outputs found
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Improving the automatic segmentation of subtitles through conditional random field
[EN] Automatic segmentation of subtitles is a novel research field which has not been studied extensively
to date. However, quality automatic subtitling is a real need for broadcasters which seek for automatic
solutions given the demanding European audiovisual legislation. In this article, a method based on Conditional
Random Field is presented to deal with the automatic subtitling segmentation. This is a continuation
of a previous work in the field, which proposed a method based on Support Vector Machine classifier
to generate possible candidates for breaks. For this study, two corpora in Basque and Spanish were
used for experiments, and the performance of the current method was tested and compared with the
previous solution and two rule-based systems through several evaluation metrics. Finally, an experiment
with human evaluators was carried out with the aim of measuring the productivity gain in post-editing
automatic subtitles generated with the new method presented.This work was partially supported by the project CoMUN-HaT - TIN2015-70924-C2-1-R (MINECO/FEDER).Alvarez, A.; MartÃnez-Hinarejos, C.; Arzelus, H.; Balenciaga, M.; Del Pozo, A. (2017). Improving the automatic segmentation of subtitles through conditional random field. Speech Communication. 88:83-95. https://doi.org/10.1016/j.specom.2017.01.010S83958
Integrated dialog act segmentation and classification using prosodic features and language models
This paper presents an integrated approach for the segmentation and classification of dialog acts (DA) in the Verbmobil project. In Verbmobil it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. In our previous work we segmented and classified a dialog in two steps: first we calculated hypotheses for the segment boundaries and decided for a boundary if the probabilities exceeded a predefined threshold level. Second we classified the segments into DAs using semantic classification trees or stochastic language models. In our new approach we integrate the segmentation and classification in the A*-algorithm to search for the optimal segmentation and classification of DAs on the basis of word hypotheses graphs (WHGs). The hypotheses for the segment boundaries are calculated with the help of a stochastic language model operating on the word chain and a multi-layer perceptron (MLP) classifying prosodic feature. The DA classification is done using a category based language model for each DA. For our experiments we used data from the Verbobil-corpus. (orig.)Appeared in proceedings EUROSPEECH '97, Rhodes (ZA), vol. 1, p. 207-210Available from TIB Hannover: RR 5221(218)+a / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEBundesministerium fuer Bildung, Wissenschaft, Forschung und Technologie, Bonn (Germany)DEGerman
Integrated Dialog Act Segmentation And Classification Using Prosodic Features And Language Models
This paper presents an integrated approach for the segmentation and classification of dialog acts (DA) in the Verbmobil project. In Verbmobil it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. In our previous work [5] we segmented and classified a dialog in two steps: first we calculated hypotheses for the segment boundaries and decided for a boundary if the probabilities exceeded a predefined threshold level. Second we classified the segments into DAs using semantic classification trees or stochastic language models. In our new approach we integrate the segmentation and classification in the A --algorithm to search for the optimal segmentation and classification of DAs on the basis of word hypotheses graphs (WHGs). The hypotheses for the segment boundaries are calculated with the help of a stochastic language model operating on the word chain and a multi-layer perceptron (MLP) classifying prosodic features. The DA classificat..