142 research outputs found
Online adaptive learning of continuous-density hidden Markov models based on multiple-stream prior evolution and posterior pooling
We introduce a new adaptive Bayesian learning framework, called multiple-stream prior evolution and posterior pooling, for online adaptation of the continuous density hidden Markov model (CDHMM) parameters. Among three architectures we proposed for this framework, we study in detail a specific two stream system where linear transformations are applied to the mean vectors of the CDHMMs to control the evolution of their prior distribution. This new stream of prior distribution can be combined with another stream of prior distribution evolved without any constraints applied. In a series of speaker adaptation experiments on the task of continuous Mandarin speech recognition, we show that the new adaptation algorithm achieves a similar fast-adaptation performance as that of the incremental maximum likelihood linear regression (MLLR) in the case of small amount of adaptation data, while maintains the good asymptotic convergence property as that of our previously proposed quasi-Bayes adaptation algorithms.published_or_final_versio
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Text to Audio Alignment
Äelem t©to prce je przkum exitujcch algoritm pro synchronizaci textu a audia. Vybrali jsme exitujc implementaci jednoho z tÄchto algoritm, kter je zaloen na skrytch markovovch modelech sdruench sekvenc a prozkoumaly jsme jeho vhody, nevhody a podivnosti. Dle jsme ovÄili, zda je mon© pedvdat spÄnost zarovnn z hodnot generovanch Viterbi algoritmem a hodnotou paprsku. Nae testovac data pochz od BBC a byla souÄst MGB Challenge 2015. Dky svoj rznorodosti poskytuj tato data ideln testovac set k ovÄen flexibility naeho algoritmu a jakoto i jeho schopnosti tolerovat chyby.The purpose of this work is to research existing text-to-speech aligning algorithms. We chose an implementation of one these algorithms, based on Hidden-Markov Joint-Sequence Models, and we explored its strengths, quirks and weaknesses. We explored whether it is possible to predict the alignment accuracy using probability values generated from Viterbi algorithm and the beam search value. Our testing data comes from the BBC as part of MGB Challenge 2015. This data creates, with its high content diversity, near perfect testing set to prove our algorithm is flexible and error independent.
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
Malay articulation system for early screening diagnostic using hidden markov model and genetic algorithm
Speech recognition is an important technology and can be used as a great aid for individuals with sight or hearing disabilities today. There are extensive research interest and development in this area for over the past decades. However, the prospect in Malaysia regarding the usage and exposure is still immature even though there is demand from the medical and healthcare sector. The aim of this research is to assess the quality and the impact of using computerized method for early screening of speech articulation disorder among Malaysian such as the omission, substitution, addition and distortion in their speech. In this study, the statistical probabilistic approach using Hidden Markov Model (HMM) has been adopted with newly designed Malay corpus for articulation disorder case following the SAMPA and IPA guidelines. Improvement is made at the front-end processing for feature vector selection by applying the silence region calibration algorithm for start and end point detection. The classifier had also been modified significantly by incorporating Viterbi search with Genetic Algorithm (GA) to obtain high accuracy in recognition result and for lexical unit classification. The results were evaluated by following National Institute of Standards and Technology (NIST) benchmarking. Based on the test, it shows that the recognition accuracy has been improved by 30% to 40% using Genetic Algorithm technique compared with conventional technique. A new corpus had been built with verification and justification from the medical expert in this study. In conclusion, computerized method for early screening can ease human effort in tackling speech disorders and the proposed Genetic Algorithm technique has been proven to improve the recognition performance in terms of search and classification task
Data augmentation for low resource languages
Recently there has been interest in the approaches for train-ing speech recognition systems for languages with limited re-sources. Under the IARPA Babel program such resources have been provided for a range of languages to support this research area. This paper examines a particular form of approach, data augmentation, that can be applied to these situations. Data aug-mentation schemes aim to increase the quantity of data available to train the system, for example semi-supervised training, multi-lingual processing, acoustic data perturbation and speech syn-thesis. To date the majority of work has considered individual data augmentation schemes, with few consistent performance contrasts or examination of whether the schemes are comple-mentary. In this work two data augmentation schemes, semi-supervised training and vocal tract length perturbation, are ex-amined and combined on the Babel limited language pack con-figuration. Here only about 10 hours of transcribed acoustic data are available. Two languages are examined, Assamese and Zulu, which were found to be the most challenging of the Ba-bel languages released for the 2014 Evaluation. For both lan-guages consistent speech recognition performance gains can be obtained using these augmentation schemes. Furthermore the impact of these performance gains on a down-stream keyword spotting task are also described. Index Terms: data augmentation, speech recognition, babel 1
Recommended from our members
Joint Training Methods for Tandem and Hybrid Speech Recognition Systems using Deep Neural Networks
Hidden Markov models (HMMs) have been the mainstream acoustic modelling approach for state-of-the-art automatic speech recognition (ASR) systems over the
past few decades. Recently, due to the rapid development of deep learning technologies, deep neural networks (DNNs) have become an essential part of nearly all kinds of ASR approaches. Among HMM-based ASR approaches, DNNs are most commonly used to extract features (tandem system configuration) or to directly produce HMM output probabilities (hybrid system configuration).
Although DNN tandem and hybrid systems have been shown to have superior
performance to traditional ASR systems without any DNN models, there are still
issues with such systems. First, some of the DNN settings, such as the choice of
the context-dependent (CD) output targets set and hidden activation functions, are
usually determined independently from the DNN training process. Second, different
ASR modules are separately optimised based on different criteria following a greedy
build strategy. For instance, for tandem systems, the features are often extracted by a
DNN trained to classify individual speech frames while acoustic models are built upon
such features according to a sequence level criterion. These issues mean that the best performance is not theoretically guaranteed.
This thesis focuses on alleviating both issues using joint training methods. In DNN
acoustic model joint training, the decision tree HMM state tying approach is extended
to cluster DNN-HMM states. Based on this method, an alternative CD-DNN training
procedure without relying on any additional system is proposed, which can produce
DNN acoustic models comparable in word error rate (WER) with those trained by the
conventional procedure. Meanwhile, the most common hidden activation functions,
the sigmoid and rectified linear unit (ReLU), are parameterised to enable automatic
learning of function forms. Experiments using conversational telephone speech (CTS)
Mandarin data result in an average of 3.4% and 2.2% relative character error rate (CER) reduction with sigmoid and ReLU parameterisations. Such parameterised functions can also be applied to speaker adaptation tasks.
At the ASR system level, DNN acoustic model and corresponding speaker dependent (SD) input feature transforms are jointly learned through minimum phone error
(MPE) training as an example of hybrid system joint training, which outperforms the
conventional hybrid system speaker adaptive training (SAT) method. MPE based speaker independent (SI) tandem system joint training is also studied. Experiments on
multi-genre broadcast (MGB) English data show that this method gives a reduction
in tandem system WER of 11.8% (relative), and the resulting tandem systems are
comparable to MPE hybrid systems in both WER and the number of parameters. In
addition, all approaches in this thesis have been implemented using the hidden Markov model toolkit (HTK) and the related source code has been or will be made publicly available with either recent or future HTK releases, to increase the reproducibility of the work presented in this thesis.Cambridge International Scholarship, Cambridge Overseas Trust
Research funding, EPSRC Natural Speech Technology Project
Research funding, DARPA BOLT Program
Research funding, iARPA Babel Progra
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network
- …