6,644 research outputs found
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for
the automatic segmentation of speech into topically coherent units. We propose
two methods for combining lexical and prosodic information using hidden Markov
models and decision trees. Lexical information is obtained from a speech
recognizer, and prosodic features are extracted automatically from speech
waveforms. We evaluate our approach on the Broadcast News corpus, using the
DARPA-TDT evaluation metrics. Results show that the prosodic model alone is
competitive with word-based segmentation methods. Furthermore, we achieve a
significant reduction in error by combining the prosodic and word-based
knowledge sources.Comment: 27 pages, 8 figure
Recommended from our members
Meter based omission of function words in MOSAIC
MOSAIC (Model of Syntax Acquisition in Children) is augmented with a new mechanism that allows for the omission of unstressed function words based on the prosodic structure of the utterance in which they occur. The mechanism allows MOSAIC to omit elements from multiple locations in a target utterance, which it was previously unable to do. It is shown that, although the new mechanism results in Optional Infinitive errors when run on children’s input, it is insufficient to simulate the high rate OI errors in children’s speech unless combined with MOSAIC’s edge-first learning mechanism. It is also shown that the addition of the new mechanism does not adversely affect MOSAIC’s fit to the Optional Infinitive phenomenon. The mechanism does, however, make MOSAIC’s output more child-like, both in terms of the range of utterances it can simulate, and the level and type of determiner omission that the model displays
Book Notice: Taylor, Paul - Text-to-Speech Synthesis
published or submitted for publicationis peer reviewe
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Predicting continuous conflict perception with Bayesian Gaussian processes
Conflict is one of the most important phenomena of social life, but it is still largely neglected by the computing community. This work proposes an approach
that detects common conversational social signals (loudness, overlapping speech,
etc.) and predicts the conflict level perceived by human observers in continuous,
non-categorical terms. The proposed regression approach is fully Bayesian and it
adopts Automatic Relevance Determination to identify the social signals that influence most the outcome of the prediction. The experiments are performed over the SSPNet Conflict Corpus, a publicly available collection of 1430 clips extracted from televised political debates (roughly 12 hours of material for 138 subjects in total). The results show that it is possible to achieve a correlation close to 0.8 between actual and predicted conflict perception
- …