    On-line adaptive learning of the correlated continuous density hidden Markov models for speech recognition

    We extend our previously proposed quasi-Bayes adaptive learning framework to cope with the correlated continuous density hidden Markov models (HMMs) with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive approximation algorithm is proposed to implement the correlated mean vectors' updating. As an example, by applying the method to an on-line speaker adaptation application, the algorithm is experimentally shown to be asymptotically convergent as well as being able to enhance the efficiency and the effectiveness of the Bayes learning by taking into account the correlation information between different model parameters. The technique can be used to cope with the time-varying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, transducers, environments, and so on.published_or_final_versio

    Распознавание речевых команд на фоне интенсивных акустических шумов по кросскорреляционным портретам

    Для создания речевых информационных систем в шумных условиях на производстве и транспорте требуются методы распознавания РК в условиях сильных помех. В настоящей работе рассматривается дикторозависимый способ распознавания речевых команд (РК) из ограниченного словаря на фоне интенсивных акустических шумов путём преобразования РК в кросскорреляционные портреты (ККП), то есть особые изображения, в которые преобразуются РК. Распознаваемая РК относится к классу с минимальным расстоянием (метрикой) между ККП этой команды и эталонными ККП класса. Разработаны алгоритмы преобразования команд в ККП, метод уточнения границ команд, способы оптимизации библиотеки эталонных команд и выбораметрики. В результате получено довольно уверенное распознавание РК при сильных помехах

    Robust speech recognition based on a Bayesian prediction approach

    We study a category of robust speech recognition problem in which mismatches exist between training and testing conditions, and no accurate knowledge of the mismatch mechanism is available. The only available information is the test data along with a set of pretrained Gaussian mixture continuous density hidden Markov models (CDHMMs). We investigate the problem from the viewpoint of Bayesian prediction. A simple prior distribution, namely constrained uniform distribution, is adopted to characterize the uncertainty of the mean vectors of the CDHMMs. Two methods, namely a model compensation technique based on Bayesian predictive density and a robust decision strategy called Viterbi Bayesian predictive classification are studied. The proposed methods are compared with the conventional Viterbi decoding algorithm in speaker-independent recognition experiments on isolated digits and TI connected digit strings (TIDTGITS), where the mismatches between training and testing conditions are caused by: (1) additive Gaussian white noise, (2) each of 25 types of actual additive ambient noises, and (3) gender difference. The experimental results show that the adopted prior distribution and the proposed techniques help to improve the performance robustness under the examined mismatch conditions.published_or_final_versio

    On adaptive decision rules and decision parameter adaptation for automatic speech recognition

    Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine prior knowledge in an existing collection of general models with a new set of condition-specific adaptation data. In this paper, the mathematical framework for Bayesian adaptation of acoustic and language model parameters is first described. Maximum a posteriori point estimation is then developed for hidden Markov models and a number of useful parameters densities commonly used in automatic speech recognition and natural language processing.published_or_final_versio