2,986 research outputs found
Pauses and the temporal structure of speech
Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks
In this paper, we describe a so-called screening approach for learning robust
processing of spontaneously spoken language. A screening approach is a flat
analysis which uses shallow sequences of category representations for analyzing
an utterance at various syntactic, semantic and dialog levels. Rather than
using a deeply structured symbolic analysis, we use a flat connectionist
analysis. This screening approach aims at supporting speech and language
processing by using (1) data-driven learning and (2) robustness of
connectionist networks. In order to test this approach, we have developed the
SCREEN system which is based on this new robust, learned and flat analysis.
In this paper, we focus on a detailed description of SCREEN's architecture,
the flat syntactic and semantic analysis, the interaction with a speech
recognizer, and a detailed evaluation analysis of the robustness under the
influence of noisy or incomplete input. The main result of this paper is that
flat representations allow more robust processing of spontaneous spoken
language than deeply structured representations. In particular, we show how the
fault-tolerance and learning capability of connectionist networks can support a
flat analysis for providing more robust spoken-language processing within an
overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial
Intelligence Research 6(1), 199
Modelling Users, Intentions, and Structure in Spoken Dialog
We outline how utterances in dialogs can be interpreted using a partial first
order logic. We exploit the capability of this logic to talk about the truth
status of formulae to define a notion of coherence between utterances and
explain how this coherence relation can serve for the construction of AND/OR
trees that represent the segmentation of the dialog. In a BDI model we
formalize basic assumptions about dialog and cooperative behaviour of
participants. These assumptions provide a basis for inferring speech acts from
coherence relations between utterances and attitudes of dialog participants.
Speech acts prove to be useful for determining dialog segments defined on the
notion of completing expectations of dialog participants. Finally, we sketch
how explicit segmentation signalled by cue phrases and performatives is covered
by our dialog model.Comment: 17 page
Phonetic Segments and the Organization of Speech
According to mainstream linguistic phonetics, speech can be modeled as a string of discrete sound segments or âphonesâ drawn from a universal phonetic inventory. Recent work has argued that a mature phonetics should refrain from theorizing about speech and speech processing using sound segments, and that the phone concept should be eliminated from linguistic theory. The paper lays out the tenets of the phone methodology and evaluates its prospects in light of the eliminativist arguments. I claim that the eliminativist arguments fail to show that the phone concept should be eliminated from linguistic theory
Diagnostic evaluation in linguistic word recognition
This report is concerned with a new method of evaluation for the Linguistic Word Recognition component of the Verbmobil-Project: Architektur. A two stage model of diagnostic evaluation is presented consisting of logical and empirical evaluation steps. Logical evaluation is carried out according to a data model which acts as optimal input in order that each component participating in the evaluation process can be tested for soundness and completeness. Inconsistencies can thus be remedied before empirical evaluation of the model is undertaken using real data. The diagnostic evaluation method has been operationalised within the Bielefeld Extended Evaluation Toolkit for Lattices of Events (BEETLE)
- âŠ