6 research outputs found

    Modelling the effects of spontaneous speech in speech recognition

    Get PDF
    Intrinsic variability of the speaker in spontaneous speech remains a challenge to state of the art Automatic speech recognition (ASR). While planned speech exhibits a moderate variability, the significant variability of spontaneous speech is caused by situation, context, intention, emotion and listeners. This conditioning of speech is observable in terms of speaking rate and in feature space. We analysed broadcast news (BN) and broadcast conversational (BC) speech in terms of phoneme rate (PR) and feature space reduction (FSR), and contrasted both with the planned speech data. Strong statistically significant differences were revealed. We cluster the speech segments with respect to their degree of PR and FSR forming a set of variability classes, and induce the variability classes into the Hidden-Markov-Model (HMM) based acoustic model (AM). In recognition we follow two approaches: the first considers the variability class as context variable, the second relies on prior estimation of the variability class after the first pass of a multi-pass recognition system. Beside explicit modelling of the intrinsic speech variability of the speaker, we furthermore segregate the general speaker specific characteristics by means of speaker adaptive training (SAT) into feature space transforms using ConstrainedMaximumLikelihood Linear Regression (CMLLR), and apply the adaptive approach in third pass recognition. By approaching to model both within speaker variation and between speaker variation in spontaneous speech, we address two fundamental sources of speech variability that determine the performance of ASR systems.Peer ReviewedPostprint (published version

    Shared-distribution hidden Markov models for speech recognition

    No full text
    Abstract: "Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shared-distribution model is proposed to replace generalized triphone models for speaker-independent continuous speech recognition.Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will increase the total number of free parameters, with distribution sharing we can essentially eliminate those redundant states and have the luxury to maintain necessary ones. By using the shared-distribution model, the error rate on the DARPA Resource Management task has been reduced by 20% in comparison with the baseline SPHINX system.

    Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

    No full text
    Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ..

    Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

    No full text
    Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ..
    corecore