Search CORE

128 research outputs found

Utilizing Statistical Dialogue Act Processing in Verbmobil

Author: Maier Elisabeth
Reithinger Norbert
Publication venue
Publication date: 01/01/1995
Field of study

In this paper, we present a statistical approach for dialogue act processing in the dialogue component of the speech-to-speech translation system \vm. Statistics in dialogue processing is used to predict follow-up dialogue acts. As an application example we show how it supports repair when unexpected dialogue states occur.Comment: 6 pages; compressed and uuencoded postscript file; to appear in ACL-9

arXiv.org e-Print Archive

A Robust and Efficient Three-Layered Dialogue Component for a Speech-to-Speech Translation System

Author: Alexandersson Jan
Maier Elisabeth
Reithinger Norbert
Publication venue
Publication date: 01/01/1994
Field of study

We present the dialogue component of the speech-to-speech translation system VERBMOBIL. In contrast to conventional dialogue systems it mediates the dialogue while processing maximally 50% of the dialogue in depth. Special requirements like robustness and efficiency lead to a 3-layered hybrid architecture for the dialogue module, using statistics, an automaton and a planner. A dialogue memory is constructed incrementally.Comment: Postscript file, compressed and uuencoded, 15 pages, to appear in Proceedings of EACL-95, Dublin

arXiv.org e-Print Archive

A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization

Author
Publication venue: Springer
Publication date: 25/11/2015
Field of study

Springer - Publisher Connector

On the use of voice descriptors for glottal source shape parameter estimation

Author: Huber Stefan
Röbel Axel
Publication venue: 'Elsevier BV'
Publication date: 08/10/2013
Field of study

International audienceThis paper summarizes the results of our investigations into estimating the shape of the glottal excitation source from speech signals. We employ the Liljencrants-Fant (LF) model describing the glottal flow and its derivative. The one-dimensional glottal source shape parameter Rd describes the transition in voice quality from a tense to a breathy voice. The parameter Rd has been derived from a statistical regression of the R waveshape parameters which parameterize the LF model. First, we introduce a variant of our recently proposed adaptation and range extension of the Rd parameter regression. Secondly, we discuss in detail the aspects of estimating the glottal source shape parameter Rd using the phase minimization paradigm. Based on the analysis of a large number of speech signals we describe the major conditions that are likely to result in erroneous Rd estimates. Based on these findings we investigate into means to increase the robustness of the Rd parameter estimation. We use Viterbi smoothing to suppress unnatural jumps of the estimated Rd parameter contours within short time segments. Additionally, we propose to steer the Viterbi algorithm by exploiting the covariation of other voice descriptors to improve Viterbi smoothing. The novel Viterbi steering is based on a Gaussian Mixture Model (GMM) that represents the joint density of the voice descriptors and the Open Quotient (OQ) estimated from corresponding electroglottographic (EGG) signals. A conversion function derived from the mixture model predicts OQ from the voice descriptors. Converted to Rd it defines an additional prior probability to adapt the partial probabilities of the Viterbi algorithm accordingly. Finally, we evaluate the performances of the phase minimization based methods using both variants to adapt and extent the Rd regression on one synthetic test set as well as in combination with Viterbi smoothing and each variant of the novel Viterbi steering on one test set of natural speech. The experimental findings exhibit improvements for both Viterbi approaches

Crossref

HAL Descartes

Hal-Diderot

Modeling of Speech Parameter Sequence Considering Global Variance for HMM-Based Speech Synthesis

Author: Tomoki Toda
Publication venue: 'IntechOpen'
Publication date: 19/04/2011
Field of study

Speech technologies such as speech recognition and speech synthesis have many potential applications since speech is the main way in which most people communicate. Various linguistic sounds are produced by controlling the configuration of oral cavities to convey a message in speech communication. The produced speech sounds temporally vary and ar

IntechOpen

CiteSeerX

Imitation/self-imitation in computer-assisted prosody training for Chinese learners of L2 Italian.

Author: Cutugno F
DE MEO Anna
Origlia A.
Pettorino Massimo
Vitale Marilisa
Publication venue: 'Iowa State University'
Publication date: 01/01/2013
Field of study

Recent studies on L2 acquisition, speech synthesis and automatic identification of foreign accents argue for a major role of prosody in the perception of non-native speech. Research on the relationship between pronunciation improvement and student/teachers’ voice similarities has also shown that the better the match between the learners' and native speakers' voices in terms of f0 and articulation rate, the more positive the impact on pronunciation training. This study investigates the effects of imitation and self-imitation on the acquisition of L2 suprasegmental patterns. Degree of foreign accent, improvements in intelligibility, and effectiveness of communication were measured to determine the success of each technique. For this purpose, a prosodic transplantation technique and a computer-assisted learning methodology were used. Recent studies on L2 acquisition, speech synthesis and automatic identification of foreign accents argue for a major role of prosody in the perception of non-native speech. Research on the relationship between pronunciation improvement and student/teachers’ voice similarities has also shown that the better the match between the learners' and native speakers' voices in terms of f0 and articulation rate, the more positive the impact on pronunciation training. This study investigates the effects of imitation and self-imitation on the acquisition of L2 suprasegmental patterns. Degree of foreign accent, improvements in intelligibility, and effectiveness of communication were measured to determine the success of each technique. For this purpose, a prosodic transplantation technique and a computer-assisted learning methodology were used

ARCHIVIO ISTITUZIONALE DELLA RICERCA-UNIVERSITA' DEGLI STUDI DI NAPOLI "L'ORIENTALE"

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Interactive speech understanding

Author: Hiroaki Saito
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Crossref

An Efficient Unit-Selection Method for Concatenative Text-to-Speech Synthesis Systems

Author: Jerneja Zganec Gros
Zganec Mario
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2008
Field of study

This paper presents a method for selecting speech units for polyphone concatenative speech synthesis, in which the simplification of procedures for search paths in a graph accelerated the speed of the unit-selection procedure with minimum effects on the speech quality. The speech units selected are still optimal; only the costs of merging the units on which the selection is based are less accurately determined. Due to its low processing power and memory footprint requirements, the method is suitable for use in embedded speech synthesizers

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia