385 research outputs found
Hidden Markov Models
Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research
Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions
International audienceThe article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR
Modelling Speech Dynamics with Trajectory-HMMs
Institute for Communicating and Collaborative SystemsThe conditional independence assumption imposed by the hidden Markov models
(HMMs) makes it difficult to model temporal correlation patterns in human speech.
Traditionally, this limitation is circumvented by appending the first and second-order
regression coefficients to the observation feature vectors. Although this leads to improved
performance in recognition tasks, we argue that a straightforward use of dynamic
features in HMMs will result in an inferior model, due to the incorrect handling
of dynamic constraints. In this thesis I will show that an HMM can be transformed
into a Trajectory-HMM capable of generating smoothed output mean trajectories, by
performing a per-utterance normalisation. The resulting model can be trained by either
maximisingmodel log-likelihood or minimisingmean generation errors on the training
data. To combat the exponential growth of paths in searching, the idea of delayed path
merging is proposed and a new time-synchronous decoding algorithm built on the concept
of token-passing is designed for use in the recognition task. The Trajectory-HMM
brings a new way of sharing knowledge between speech recognition and synthesis
components, by tackling both problems in a coherent statistical framework. I evaluated
the Trajectory-HMM on two different speech tasks using the speaker-dependent
MOCHA-TIMIT database. First as a generative model to recover articulatory features
from speech signal, where the Trajectory-HMM was used in a complementary way
to the conventional HMM modelling techniques, within a joint Acoustic-Articulatory
framework. Experiments indicate that the jointly trained acoustic-articulatory models
are more accurate (having a lower Root Mean Square error) than the separately trained
ones, and that Trajectory-HMM training results in greater accuracy compared with
conventional Baum-Welch parameter updating. In addition, the Root Mean Square
(RMS) training objective proves to be consistently better than the Maximum Likelihood
objective. However, experiment of the phone recognition task shows that the
MLE trained Trajectory-HMM, while retaining attractive properties of being a proper
generative model, tends to favour over-smoothed trajectories among competing hypothesises,
and does not perform better than a conventional HMM. We use this to
build an argument that models giving a better fit on training data may suffer a reduction
of discrimination by being too faithful to the training data. Finally, experiments
on using triphone models show that increasing modelling detail is an effective way to
leverage modelling performance with little added complexity in training
Modeling of Speech Parameter Sequence Considering Global Variance for HMM-Based Speech Synthesis
Speech technologies such as speech recognition and speech synthesis have many potential applications since speech is the main way in which most people communicate. Various linguistic sounds are produced by controlling the configuration of oral cavities to convey a message in speech communication. The produced speech sounds temporally vary and ar
Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data
Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool
- …