42,567 research outputs found
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Towards Understanding Egyptian Arabic Dialogues
Labelling of user's utterances to understanding his attends which called
Dialogue Act (DA) classification, it is considered the key player for dialogue
language understanding layer in automatic dialogue systems. In this paper, we
proposed a novel approach to user's utterances labeling for Egyptian
spontaneous dialogues and Instant Messages using Machine Learning (ML) approach
without relying on any special lexicons, cues, or rules. Due to the lack of
Egyptian dialect dialogue corpus, the system evaluated by multi-genre corpus
includes 4725 utterances for three domains, which are collected and annotated
manually from Egyptian call-centers. The system achieves F1 scores of 70. 36%
overall domains.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0308
Recognizing Uncertainty in Speech
We address the problem of inferring a speaker's level of certainty based on
prosodic information in the speech signal, which has application in
speech-based dialogue systems. We show that using phrase-level prosodic
features centered around the phrases causing uncertainty, in addition to
utterance-level prosodic features, improves our model's level of certainty
classification. In addition, our models can be used to predict which phrase a
person is uncertain about. These results rely on a novel method for eliciting
utterances of varying levels of certainty that allows us to compare the utility
of contextually-based feature sets. We elicit level of certainty ratings from
both the speakers themselves and a panel of listeners, finding that there is
often a mismatch between speakers' internal states and their perceived states,
and highlighting the importance of this distinction.Comment: 11 page
User-Adaptive A Posteriori Restoration for Incorrectly Segmented Utterances in Spoken Dialogue Systems
Ideally, the users of spoken dialogue systems should be able to speak at their own tempo. Thus, the systems needs to interpret utterances from various users correctly, even when the utterances contain pauses. In response to this issue, we propose an approach based on a posteriori restoration for incorrectly segmented utterances. A crucial part of this approach is to determine whether restoration is required. We use a classification-based approach, adapted to each user. We focus on each user’s dialogue tempo, which can be obtained during the dialogue, and determine the correlation between each user’s tempo and the appropriate thresholds for classification. A linear regression function used to convert the tempos into thresholds is also derived. Experimental results show that the proposed user adaptation approach applied to two restoration classification methods, thresholding and decision trees, improves classification accuracies by 3.0% and 7.4%, respectively, in cross validation
- …