Search CORE

1,558 research outputs found

Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition

Author: Bouselmi Ghazi
Fohr Dominique
Illina Irina
Publication venue
Publication date: 27/08/2007
Field of study

In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we have used different combinations of acoustic models representing the canonical and the foreign pronunciations: spoken and native models, models adapted to the non-native accent with MAP and MLLR. The joint use of pronunciation modelling and acoustic adaptation led to further improvements in recognition accuracy. The best combination of the above mentioned techniques resulted in a relative word error reduction ranging from 46% to 71%

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

Author: Huemmer Christian
Kellermann Walter
Maas Roland
Sehr Armin
Publication venue
Publication date: 22/09/2014
Field of study

This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

arXiv.org e-Print Archive

A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

Author: Ren Yao
Johnson Michael T
Clemins Patrick J.
Darre Michael
Glaeser Sharon Stuart
Osiejuk Tomasz S.
Out-Nyarko Ebenezer
Publication venue: e-Publications@Marquette
Publication date: 19/07/1999
Field of study

Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks

epublications@Marquette

University of Sheffield Library Digital Collections

A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

Author: Clemins Patrick J.
Darre Michael
Glaeser Sharon Stuart
Johnson Michael T
Osiejuk Tomasz S.
Out-Nyarko Ebenezer
Ren Yao
Publication venue: e-Publications@Marquette
Publication date: 01/11/2009
Field of study

Multidisciplinary Digital Publishing Institute

epublications@Marquette

Directory of Open Access Journals

Generalization of Extended Baum-Welch Parameter Estimation for Discriminative Training and Decoding

Author: Kanevsky Dr Dimitri
Nahamoo Dr David
Ramabhadran Dr Bhuvana
Sainath Dr Tara
Publication venue
Publication date: 27/04/2008
Field of study

We demonstrate the generalizability of the Extended Baum-Welch (EBW) algorithm not only for HMM parameter estimation but for decoding as well.\ud We show that there can exist a general function associated with the objective function under EBW that reduces to the well-known auxiliary function used in the Baum-Welch algorithm for maximum likelihood estimates.\ud We generalize representation for the updates of model parameters by making use of a differentiable function (such as arithmetic or geometric\ud mean) on the updated and current model parameters and describe their effect on the learning rate during HMM parameter estimation. Improvements on speech recognition tasks are also presented here

Combining joint factor analysis and iVectors for robust language recognition

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Sketching for Large-Scale Learning of Mixture Models

Author: Bourrier Anthony
Gribonval Rémi
Keriven Nicolas
Pérez Patrick
Publication venue
Publication date: 20/03/2016
Field of study

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1