88 research outputs found
Integrating Syntactic and Prosodic Information for the Efficient Detection of Empty Categories
We describe a number of experiments that demonstrate the usefulness of
prosodic information for a processing module which parses spoken utterances
with a feature-based grammar employing empty categories. We show that by
requiring certain prosodic properties from those positions in the input where
the presence of an empty category has to be hypothesized, a derivation can be
accomplished more efficiently. The approach has been implemented in the machine
translation project VERBMOBIL and results in a significant reduction of the
work-load for the parser.Comment: To appear in the Proceedings of Coling 1996, Copenhagen. 6 page
Syntactic-prosodic labeling of large spontaneous speech data-bases
In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verbmobil project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large Verbmobil spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%)
Detection of phrase boundaries and accents
On a large speech database read by untrained speakers experiments for the recognition of phrase boundaries and phrase accents were performed. We used durational features as well as features derived from pitch and energy contours and pause information. Different sets of features were compared. For distinguishing three different boundary classes a recognition rate of 75.7% and for distinguishing accentuated from unaccentuated syllables a recognition rate of 88.7% could be achieved
Prosodic processing and its use in Verbmobil
We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings
Dialog act classification with the help of prosody
This paper presents automatic methods for the segmentation and classication of dialog acts (DA). In Verbmobil it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. Since a turn can consist of one or more successive DAs we conduct the classification of DAs in a two step procedure: First each turn has to be segmented into units which correspond to a DA and second the DA categories have to be identified. For the segmentation we use polygrams and multi -layer perceptrons, using prosodic features. The classification of DAs is done with semantic classication trees and polygrams
Improving parsing by incorporating "prosodic clause boundaries" into a grammar
In written language, punctuation is used to separate main and subordinate clause. In spoken language, ambiguities arise due to missing punctuation, but clause boundaries are often marked prosodically and can be used instead. We detect PCBs (Prosodically markedClauseBoundaries) by using prosodic features (duration, intonation, energy, and pause information) with a neural network, achieving a recognition rate of 82%. PCBs are integrated into our grammar using a special syntactic category "break" that can be used in the phrase-structure rules of the grammar in a similar way as punctuation is used in grammars for written language. Whereas punctuation in most cases is obligatory, PCBs are sometimes optional. Moreover, they can in principle occur everywhere in the sentence due e.g. to hesitations or misrecognition. To cope with these problems we tested two different approaches: A slightly modified parser for word chains containing PCBs and a word graph parser that takes the probabilities of PCBs into account. Tests were conducted on a subset of infinitive subordinate clauses from a large speech database containing sentences from the domain of train table inquiries. The average number of syntactic derivations could be reduced by about 70 % even when working on recognized word graphs
Automatic classification of prosodically marked phrase boundaries in German
A large corpus has been created automatically and read by speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable
Prosodic scoring of word hypotheses graphs
Prosodic boundary detection is important to disambiguate parsing, especially in spontaneous speech, where elliptic sentences occur frequently. Word graphs are an efficient interface between word recognition and parser. Prosodic classification of word chains has been published earlier. The adjustments necessary for applying these classification techniques to word graphs are discussed in this paper. When classifying a word hypothesis a set of context words has to be determined appropriately. A method has been developed to use stochastic language models for prosodic classification. This as well has been adopted for the use on word graphs. We also improved the set of acoustic-prosodic features with which the recognition errors were reduced by about 60% on the read speech we were working on previously, now achieving 10% error rate for 3 boundary classes and 3% for 2 accent classes. Moving to spontaneous speech the recognition error increases significantly (e.g. 16% for a 2-class boundary task). We show that even on word graphs the combination of language models which model a larger context with acoustic-prosodic classifiers reduces the recognition error by up to 50 %
Prosodic modules for speech recognition and understanding in VERBMOBIL
Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%
- …