Search CORE

5 research outputs found

Progress in the CU-HTK broadcast news transcription system

Author: D. Mrva
Do Yeong Kim
Ho Yin Chan
M.J.F. Gales
P.C. Woodland
R. Sinha
S.E. Tranter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Joint Training Methods for Tandem and Hybrid Speech Recognition Systems using Deep Neural Networks

Author: Zhang Chao
Publication venue: University of Cambridge
Publication date: 11/04/2019
Field of study

Hidden Markov models (HMMs) have been the mainstream acoustic modelling approach for state-of-the-art automatic speech recognition (ASR) systems over the past few decades. Recently, due to the rapid development of deep learning technologies, deep neural networks (DNNs) have become an essential part of nearly all kinds of ASR approaches. Among HMM-based ASR approaches, DNNs are most commonly used to extract features (tandem system configuration) or to directly produce HMM output probabilities (hybrid system configuration). Although DNN tandem and hybrid systems have been shown to have superior performance to traditional ASR systems without any DNN models, there are still issues with such systems. First, some of the DNN settings, such as the choice of the context-dependent (CD) output targets set and hidden activation functions, are usually determined independently from the DNN training process. Second, different ASR modules are separately optimised based on different criteria following a greedy build strategy. For instance, for tandem systems, the features are often extracted by a DNN trained to classify individual speech frames while acoustic models are built upon such features according to a sequence level criterion. These issues mean that the best performance is not theoretically guaranteed. This thesis focuses on alleviating both issues using joint training methods. In DNN acoustic model joint training, the decision tree HMM state tying approach is extended to cluster DNN-HMM states. Based on this method, an alternative CD-DNN training procedure without relying on any additional system is proposed, which can produce DNN acoustic models comparable in word error rate (WER) with those trained by the conventional procedure. Meanwhile, the most common hidden activation functions, the sigmoid and rectified linear unit (ReLU), are parameterised to enable automatic learning of function forms. Experiments using conversational telephone speech (CTS) Mandarin data result in an average of 3.4% and 2.2% relative character error rate (CER) reduction with sigmoid and ReLU parameterisations. Such parameterised functions can also be applied to speaker adaptation tasks. At the ASR system level, DNN acoustic model and corresponding speaker dependent (SD) input feature transforms are jointly learned through minimum phone error (MPE) training as an example of hybrid system joint training, which outperforms the conventional hybrid system speaker adaptive training (SAT) method. MPE based speaker independent (SI) tandem system joint training is also studied. Experiments on multi-genre broadcast (MGB) English data show that this method gives a reduction in tandem system WER of 11.8% (relative), and the resulting tandem systems are comparable to MPE hybrid systems in both WER and the number of parameters. In addition, all approaches in this thesis have been implemented using the hidden Markov model toolkit (HTK) and the related source code has been or will be made publicly available with either recent or future HTK releases, to increase the reproducibility of the work presented in this thesis.Cambridge International Scholarship, Cambridge Overseas Trust Research funding, EPSRC Natural Speech Technology Project Research funding, DARPA BOLT Program Research funding, iARPA Babel Progra

Apollo (Cambridge)

Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system

Author: B. Jia
K. C. Sim
K. Yu
M. J. F. Gales
P. C. Woodland
X. Liu
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes the development of the CUHTK 2004 Mandarin conversational telephone speech transcription system. The paper details all aspects of the system, but concentrates on the development of the acoustic models. As there are significant differences between the available training corpora, both in terms of topics of conversation and accents, forms of data normalisation and adaptive training techniques are investigated. The baseline discriminatively trained acoustic models are compared to a system built with a Gaussianisation front-end, a speaker adaptively trained system and an adaptively trained structured precision matrix system. The models are finally evaluated within a multi-pass, mult-branch, system combination framework

CiteSeerX

Development of the CUHTK 2004 Mandarin conversational telephone speech transcription system

Author: Gales MJF
Jia B
Liu X
Sim KC
Woodland PC
Yu K
Publication venue
Publication date: 01/01/2005
Field of study

CUED - Cambridge University Engineering Department

Development of the CUHTK 2004 RT04F Mandarin conversational telephone speech transcription system

Author: Gales MJF
Jia B
Liu X
Sim KC
Woodland PC
Yu K
Publication venue
Publication date: 16/09/2004
Field of study

CUED - Cambridge University Engineering Department