1,430 research outputs found
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
Problem spotting in human-machine interaction
In human-human communication, dialogue participants are con-tinuously sending and receiving signals on the status of the inform-ation being exchanged. We claim that if spoken dialogue systems were able to detect such cues and change their strategy accordingly, the interaction between user and systemwould improve. Therefore, the goals of the present study are as follows: (i) to find out which positive and negative cues people actually use in human-machine interaction in response to explicit and implicit verification questions and (ii) to see which (combinations of) cues have the best predictive potential for spotting the presence or absence of problems. It was found that subjects systematically use negative/marked cues (more words, marked word order, more repetitions and corrections, less new information etc.) when there are communication problems. Using precision and recall matrices it was found that various combinations of cues are accurate problem spotters. This kind of information may turn out to be highly relevant for spoken dia-logue systems, e.g., by providing quantitative criteria for changing the dialogue strategy or speech recognition engine
Signaling and detecting uncertainty in audiovisual speech by children and adults
We describe two experiments on signaling and detecting uncertainty in audiovisual speech by adults and children. In the first study, ut-terances from adult speakers and child speakers (aged 7-8) were elicitated and annotated with a set of six audiovisual features. It was found that when adult speakers are uncertain about their answer they are more likely to produce filled pauses, delays, high intonation, eyebrow movements, smiles and funny faces. The basic picture for the child speakers is similar, in that the presence of an audiovisual cue in an answer correlates with uncertainty, but the differences are relatively small and only significant for the features delay, eyebrow and funny face. In the second study both adult and child judges watched answers from adult and child speakers selected from the first study to find out whether they were able to correctly estimate a speakers ’ level of uncertainty. It was found that both child and adult judges give more accurate scores for answers from adult speakers than from child speakers and that child judges overall provide less accurate scores than adult judges. 1
- …