2 research outputs found

    Eigentrigraphemes for under-resourced languages

    Get PDF
    Abstract Grapheme-based modeling has an advantage over phone-based modeling in automatic speech recognition for under-resourced languages when a good dictionary is not available. Recently we proposed a new method for parameter estimation of context-dependent hidden Markov model (HMM) called eigentriphone modeling. Eigentriphone modeling outperforms conventional tied-state HMM by eliminating the quantization errors among the tied states. The eigentriphone modeling framework is very flexible and can be applied to any group of modeling unit provided that they may be represented by vectors of the same dimension. In this paper, we would like to port the eigentriphone modeling method from a phone-based system to a grapheme-based system; the new method will be called eigentrigrapheme modeling. Experiments on four official South African under-resourced languages (Afrikaans, South African English, Sesotho, siSwati) show that the new eigentrigrapheme modeling method reduces the word error rates of conventional tied-state trigrapheme modeling by an average of 4.08% relative

    A Fully Automated Derivation of State-based Eigentriphones for Triphone Modeling with No Tied States Using Regularization

    No full text
    Recently we proposed an alternative method called eigentriphone to solve the data insufficiency problem in triphone acoustic modeling without the need of state tying. The idea is to treat the acoustic modeling problem of infrequent triphones ("poor triphones") as an adaptation problem from the more frequent triphones ("rich triphones"): firstly, an eigenbasis is developed over the rich triphones that have sufficient training data and the eigenvectors are called eigentriphones; then the poor triphones are adapted in a fashion similar to eigenvoice adaptation. Since, in general, no states are tied in our method, all triphones (states) are distinct so that they can be more discriminative than tied-state triphones. In our previous work, the number of eigentriphones was determined in advance with a set of development data. In this paper, we investigate simply using all of them with the help of regularization to naturally penalize the less important ones. In addition, the model-based eigenbasis is replaced by three state-based eigenbases. Experimental evaluation on the WSJ 5K task shows that triphone models trained using our new eigentriphone approach without state tying perform at least as well as the common tied-state triphone models. Copyright © 2011 ISCA
    corecore