Search CORE

5,117 research outputs found

Pauses and the temporal structure of speech

Author: Zellner Brigitte
Publication venue: John Wiley
Publication date: 01/01/1994
Field of study

Natural-sounding speech synthesis requires close control over the temporal structure of the speech flow. This includes a full predictive scheme for the durational structure and in particuliar the prolongation of final syllables of lexemes as well as for the pausal structure in the utterance. In this chapter, a description of the temporal structure and the summary of the numerous factors that modify it are presented. In the second part, predictive schemes for the temporal structure of speech ("performance structures") are introduced, and their potential for characterising the overall prosodic structure of speech is demonstrated

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Production and perception of speaker-specific phonetic detail at word boundaries

Author: Allen
Allen
Baayen
Bradlow
Bybee
Charles-Luce
Cho
Church
Coleman
Cooper
Cruttenden
Dahan
Davis
Davis
Eisner
Fougeron
Fougeron
Goldinger
Goldinger
Goldinger
Goldinger
Gow
Grossberg
Gårding
Hawkins
Hawkins
Hawkins
Hay
Heinrich
Hervais-Adelman
Hoard
Holm
Johnson
Johnson
Jones
Jurafsky
Kemps
Kemps
Klatt
Krakow
Kraljic
Kucera
Lachs
Lehiste
Lehiste
Local
Luce
Markham
Mattys
McLennan
McQueen
Miller
Newman
Nielsen
Norris
Norris
Nygaard
Nygaard
Nygaard
O'Connor
Ogden
Ogden
Ohala
Palmeri
Pickett
Pierrehumbert
Pierrehumbert
Pisoni
Quené
Quené
Rachel Smith
Ranbom
Rice
Rietveld
Roy
Saffran
Saltzman
Salverda
Sarah Hawkins
Shockley
Sidaras
Simko
Sommers
Sprague
Stevens
Stuart-Smith
Sumner
Traunmüller
Turk
Turk
Umeda
van Santen
Walsh
Wyld
Publication venue: 'Elsevier BV'
Publication date: 01/03/2012
Field of study

Experiments show that learning about familiar voices affects speech processing in many tasks. However, most studies focus on isolated phonemes or words and do not explore which phonetic properties are learned about or retained in memory. This work investigated inter-speaker phonetic variation involving word boundaries, and its perceptual consequences. A production experiment found significant variation in the extent to which speakers used a number of acoustic properties to distinguish junctural minimal pairs e.g. 'So he diced them'—'So he'd iced them'. A perception experiment then tested intelligibility in noise of the junctural minimal pairs before and after familiarisation with a particular voice. Subjects who heard the same voice during testing as during the familiarisation period showed significantly more improvement in identification of words and syllable constituents around word boundaries than those who heard different voices. These data support the view that perceptual learning about the particular pronunciations associated with individual speakers helps listeners to identify syllabic structure and the location of word boundaries

Crossref

Enlighten

Improving parsing of spontaneous speech with the help of prosodic boundaries

Author: Batliner Anton
Block H. U.
Kießling Andreas
Kompe Ralf
Niemann Heinrich
Nöth Elmar
Ruland T.
Schacht S.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1997
Field of study

Parsing can be improved in automatic speech understanding if prosodic boundary marking is taken into account, because syntactic boundaries are often marked by prosodic means. Because large databases are needed for the training of statistical models for prosodic boundaries, we developed a labeling scheme for syntactic-prosodic boundaries within the German VERBMOBIL project (automatic speech-to-speech translation). We compare the results of classifiers (multi-layer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and purely syntactic labels. Recognition rates of up to 96% were achieved. The turns that we need to parse consist of 20 words on the average and frequently contain sequences of partial sentence equivalents due to restarts, ellipsis, etc. For this material, the boundary scores computed by our classifiers can successfully be integrated into the syntactic parsing of word graphs; currently, they improve the parse time by 92% and reduce the number of parse trees by 96%. This is achieved by introducing a special Prosodic Syntactic Clause Boundary symbol (PSCB) into our grammar and guiding the search for the best word chain with the prosodic boundary scores

CiteSeerX

Universaar

Acronym

From Monologue to Dialogue: Natural Language Generation in OVIS

Author: Theune Mariët
Publication venue
Publication date: 01/01/2003
Field of study

This paper describes how a language generation system that was originally designed for monologue generation, has been adapted for use in the OVIS spoken dialogue system. To meet the requirement that in a dialogue, the system's utterances should make up a single, coherent dialogue turn, several modifications had to be made to the system. The paper also discusses the influence of dialogue context on information status, and its consequences for the generation of referring expressions and accentuation

CiteSeerX

University of Twente Research Information

Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

Author: Andreas Stolcke
Bahl
Baum
Breiman
Brown
Bruce
Buntine
Dermatas
Dilek Hakkani-Tür
Elizabeth Shriberg
Gökhan Tür
Hearst
Katz
Palmer
Shriberg
Sluijter
Swerts
Swerts
Swerts
Thorsen
Viterbi
Publication venue
Publication date: 01/01/2000
Field of study

A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Improving parsing by incorporating "prosodic clause boundaries" into a grammar

Author: Bakenecker G.
Batliner Anton
Block U.
Kompe Ralf
Nöth Elmar
Regel-Brietzmann P.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1994
Field of study

In written language, punctuation is used to separate main and subordinate clause. In spoken language, ambiguities arise due to missing punctuation, but clause boundaries are often marked prosodically and can be used instead. We detect PCBs (Prosodically markedClauseBoundaries) by using prosodic features (duration, intonation, energy, and pause information) with a neural network, achieving a recognition rate of 82%. PCBs are integrated into our grammar using a special syntactic category "break" that can be used in the phrase-structure rules of the grammar in a similar way as punctuation is used in grammars for written language. Whereas punctuation in most cases is obligatory, PCBs are sometimes optional. Moreover, they can in principle occur everywhere in the sentence due e.g. to hesitations or misrecognition. To cope with these problems we tested two different approaches: A slightly modified parser for word chains containing PCBs and a word graph parser that takes the probabilities of PCBs into account. Tests were conducted on a subset of infinitive subordinate clauses from a large speech database containing sentences from the domain of train table inquiries. The average number of syntactic derivations could be reduced by about 70 % even when working on recognized word graphs

Universaar

Acronym

Prosodic modules for speech recognition and understanding in VERBMOBIL

Author: Batliner Anton
Hess Wolfgang
Kießling Andreas
Kompe Ralf
Nöth Elmar
Petzol Anja
Reyelt Matthias
Strom Volker
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%

CiteSeerX

Universaar

Acronym