63 research outputs found

    Prosodic processing and its use in Verbmobil

    Get PDF
    We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings

    Prosody takes over : a prosodically guided dialog system

    Get PDF
    In this paper first experiments with naive persons using the speech understanding and dialog system EVAR are discussed. The domain of EVAR is train table inquiry. We observed that in real human-human dialogs when the officer transmits the information the customer very often interrupts. Many of these interruptions are just repetitions of the time of day given by the officer. The functional role of these interruptions is determined by prosodic cues only. An important result of the experiments with EVAR is that it is hard to follow the system giving the train connection via speech synthesis. In this case it is even more important than in human-human dialogs that the user has the opportunity to interact during the answer phase. Therefore we extended the dialog module to allow the user to repeat the time of day and we added a prosody module guiding the continuation of the dialog

    Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

    Get PDF
    In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

    Prosody takes over : towards a prosodically guided dialog system

    Get PDF
    The domain of the speech recognition and dialog system EVAR is train time table inquiry. We observed that in real human-human dialogs when the officer transmits the information, the customer very often interrupts. Many of these interruptions are just repetitions of the time of day given by the officer. The functional role of these interruptions is often determined by prosodic cues only. An important result of experiments where naive persons used the EVAR system is that it is hard to follow the train connection given via speech synthesis. In this case it is even more important than in human-human dialogs that the user has the opportunity to interact during the answer phase. Therefore we extended the dialog module to allow the user to repeat the time of day and we added a prosody module guiding the continuation of the dialog by analyzing the intonation contour of this utterance.Der Diskursbereich des Spracherkennungs- und Dialogsystems EVAR ist Fahrplanauskunft fĂŒr ZĂŒge. Wir beobachteten, dass in realen Mensch-Mensch Dialogen der Kunde sehr oft den Auskunftsbeamten unterbricht, wenn dieser die Information ĂŒbermittelt. Viele dieser Unterbrechungen sind ausschließlich Wiederholungen der Uhrzeitangabe des Beamten. Die funktionale Rolle dieser Unterbrechungen wird hĂ€ufig alleine durch prosodische Mittel bestimmt. Ein wichtiges Ergebnis von Dialog Experimenten mit naiven Personen ergab, dass es schwer ist, den VerbindungsauskĂŒnften von EVAR via Sprachsynthese zu folgen. In diesem Fall ist es sogar noch wichtiger als in Mensch-Mensch Dialogen, dass der Benutzer die Möglichkeit hat, wĂ€hrend der Antwortphase zu interagieren. Deshalb haben wir das Dialogmodul erweitert, um dem Benutzer die Möglichkeit zu geben, die Uhrzeitangaben zu wiederholen, und wir fĂŒgten ein Prosodiemodul hinzu, das die FortfĂŒhrung des Dialogs steuert, indem die Intonation dieser Äußerung analysiert wir

    Research on Architectures for Integrated Speech/Language Systems in Verbmobil

    Get PDF
    The German joint research project Verbmobil (VM) aims at the development of a speech to speech translation system. This paper reports on research done in our group which belongs to Verbmobil's subproject on system architectures (TP15). Our specific research areas are the construction of parsers for spontaneous speech, investigations in the parallelization of parsing and to contribute to the development of a flexible communication architecture with distributed control.Comment: 6 pages, 2 Postscript figure

    Prosodic modules for speech recognition and understanding in VERBMOBIL

    Get PDF
    Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%

    Data-driven Extraction of Intonation Contour Classes

    Get PDF
    In this paper we introduce the first steps towards a new datadriven method for extraction of intonation events that does not require any prerequisite prosodic labelling. Provided with data segmented on the syllable constituent level it derives local and global contour classes by stylisation and subsequent clustering of the stylisation parameter vectors. Local contour classes correspond to pitch movements connected to one or several syllables and determine the local f0 shape. Global classes are connected to intonation phrases and determine the f0 register. Local classes initially are derived for syllabic segments, which are then concatenated incrementally by means of statistical language modelling of co-occurrence patterns. Due to its generality the method is in principal language independent and potentially capable to deal also with other aspects of prosody than intonation. 1

    Integrating Syntactic and Prosodic Information for the Efficient Detection of Empty Categories

    Get PDF
    We describe a number of experiments that demonstrate the usefulness of prosodic information for a processing module which parses spoken utterances with a feature-based grammar employing empty categories. We show that by requiring certain prosodic properties from those positions in the input where the presence of an empty category has to be hypothesized, a derivation can be accomplished more efficiently. The approach has been implemented in the machine translation project VERBMOBIL and results in a significant reduction of the work-load for the parser.Comment: To appear in the Proceedings of Coling 1996, Copenhagen. 6 page
    • 

    corecore