3 research outputs found

    Measuring the gap between HMM-based ASR and TTS

    Get PDF
    The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems, measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches

    Speech recognition with speech synthesis models by marginalising over decision tree leaves

    Get PDF
    There has been increasing interest in the use of unsupervised adaptation for the personalisation of text-to-speech (TTS) voices, particularly in the context of speech-to-speech translation. This requires that we are able to generate adaptation transforms from the output of an automatic speech recognition (ASR) system. An approach that utilises unified ASR and TTS models would seem to offer an ideal mechanism for the application of unsupervised adaptation to TTS since transforms could be shared between ASR and TTS. Such unified models should use a common set of parameters. A major barrier to such parameter sharing is the use of differing contexts in ASR and TTS. In this paper we propose a simple approach that generates ASR models from a trained set of TTS models by marginalising over the TTS contexts that are not used by ASR. We present preliminary results of our proposed method on a large vocabulary speech recognition task and provide insights into future directions of this work

    DARPA February 1992 pilot corpus CSR "dry run" benchmark test results

    No full text
    corecore