Search CORE

16 research outputs found

Dialog act classification with the help of prosody

Author: Harbeck Stefan
Kießling Andreas
Kompe Ralf
Mast Marion
Niemann Heinrich
Nöth Elmar
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

This paper presents automatic methods for the segmentation and classication of dialog acts (DA). In Verbmobil it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. Since a turn can consist of one or more successive DAs we conduct the classification of DAs in a two step procedure: First each turn has to be segmented into units which correspond to a DA and second the DA categories have to be identified. For the segmentation we use polygrams and multi -layer perceptrons, using prosodic features. The classification of DAs is done with semantic classication trees and polygrams

CiteSeerX

Universaar

Acronym

Prosodic processing and its use in Verbmobil

Author: Batliner Anton
Kießling Andreas
Kompe Ralf
Niemann Heinrich
Nöth Elmar
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1997
Field of study

We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings

CiteSeerX

Universaar

Acronym

Automatic detection of discourse structure for speech recognition and understanding.

Author: Bates Rebecca
Coccaro Noah
Jurafsky Daniel
Martin Rachel
Meteer Marie
Ries Klaus
Shriberg Elizabeth
Stolcke Andreas
Taylor Paul A
Van Ess-Dykema Carol
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 ‘Dialog Acts’ (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72% accuracy based on word transcripts. Using this detector to switch among the 42 Dialog- Act-Specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error

CiteSeerX

Edinburgh Research Archive

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

A survey of machine learning approaches to analysis of large corpora

Author: Atwell ES
Hu X
Publication venue: UCREL, Lancaster University
Publication date: 01/01/2003
Field of study

Corpus-based Machine Learning of linguistic annotations has been a key topic for all areas of Natural Language Processing. This paper presents a survey, along three dimensions of classification. First we outline different linguistic level of analysis: Tokenisation, Part-of-Speech tagging, Parsing, Semantic analysis and Discourse annotation. Secondly, we introduce alternative approaches to Machine Learning applicable to linguistic annotation of corpora: N-gram and Markov models, Neural Networks, Transformation-Based Learning, Decision Tree learning, and Vector-based classification. Thirdly, weexamine a range of Machine Learning systems for the most challenging level of linguistic annotation, discourse analysis; these illustrate the various Machine Learning approaches. Our overall aim is to provide an ontology or framework for further development of our research

White Rose Research Online

Hybride konnektionistische, statistische und regelbasierte Ansätze zur Verarbeitung natürlicher Sprache : Workshop auf der 21. Deutschen Jahrestagung für Künstliche Intelligenz, Freiburg, 9.-10. September 1997

Author
Publication venue
Publication date: 01/01/1998
Field of study

Acronym

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

Author: Bates Rebecca
Coccaro Noah
Jurafsky Daniel
Martin Rachel
Meteer Marie
Ries Klaus
Shriberg Elizabeth
Stolcke Andreas
Taylor Paul
Van Ess-Dykema Carol
Publication venue: Kingston Press Services Ltd.
Publication date: 01/01/1998
Field of study

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study examines over 1000 conversations from the Switchboard corpus. DAs were handannotated, and prosodic features (duration, pause, F0, energy and speakingrate features) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. For an allway classification as well as three subtasks, prosody allowed highly significant classification over chance. Featurespecific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DAspecific statistical language model improved performance over that of the language model alone. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato