Search CORE

3 research outputs found

Measuring the gap between HMM-based ASR and TTS

Author: Dines John
King Simon
Yamagishi Junichi
Publication venue
Publication date: 01/01/2009
Field of study

The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems, measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Edinburgh Research Archive

Edinburgh Research Explorer

Speech recognition with speech synthesis models by marginalising over decision tree leaves

Author: Dines John
Liang Hui
Saheer Lakshmi
Publication venue: Idiap
Publication date: 11/02/2010
Field of study

There has been increasing interest in the use of unsupervised adaptation for the personalisation of text-to-speech (TTS) voices, particularly in the context of speech-to-speech translation. This requires that we are able to generate adaptation transforms from the output of an automatic speech recognition (ASR) system. An approach that utilises unified ASR and TTS models would seem to offer an ideal mechanism for the application of unsupervised adaptation to TTS since transforms could be shared between ASR and TTS. Such unified models should use a common set of parameters. A major barrier to such parameter sharing is the use of differing contexts in ASR and TTS. In this paper we propose a simple approach that generates ASR models from a trained set of TTS models by marginalising over the TTS contexts that are not used by ASR. We present preliminary results of our proposed method on a large vocabulary speech recognition task and provide insights into future directions of this work

Infoscience - École polytechnique fédérale de Lausanne

DARPA February 1992 pilot corpus CSR "dry run" benchmark test results

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1992
Field of study

Crossref