5 research outputs found

    Eigentrigraphemes for under-resourced languages

    Get PDF
    Abstract Grapheme-based modeling has an advantage over phone-based modeling in automatic speech recognition for under-resourced languages when a good dictionary is not available. Recently we proposed a new method for parameter estimation of context-dependent hidden Markov model (HMM) called eigentriphone modeling. Eigentriphone modeling outperforms conventional tied-state HMM by eliminating the quantization errors among the tied states. The eigentriphone modeling framework is very flexible and can be applied to any group of modeling unit provided that they may be represented by vectors of the same dimension. In this paper, we would like to port the eigentriphone modeling method from a phone-based system to a grapheme-based system; the new method will be called eigentrigrapheme modeling. Experiments on four official South African under-resourced languages (Afrikaans, South African English, Sesotho, siSwati) show that the new eigentrigrapheme modeling method reduces the word error rates of conventional tied-state trigrapheme modeling by an average of 4.08% relative

    Äärelliset tilamallit lukupuheen tunnistamisessa ja tarkastamisessa

    Get PDF
    An automatic speech recognition system has to combine acoustic and linguistic information. Therefore the search space spans multiple layers. Finite state models and weighted finite state transducers in particular can efficiently represent this search space by modeling each layer as a transducer and combining them using generic weighted finite state transducer algorithms. When recognising a text prompt being read aloud, the prompt gives a good estimate of what is going to be said. However human reading naturally produces some deviations from the text, called miscues. The purpose of this thesis is to create a system which accurately recognises recordings of reading. A miscue tolerant finite state language model is implemented and compared against two traditional approaches, an N-gram model and forced alignment. The recognition result will ultimately be used to validate the recording as fit for further automatic processing in a spoken foreign language exam, which Project DigiTala is designing for the Finnish matriculation examination. The computerization of the matriculation examination in Finland makes the use of such automatic tools possible. This thesis first introduces the context for the task of recognising and validating reading. Then it explores three methodologies needed to solve the task: automatic speech recognition, finite state models, and the modeling of reading. Next it recounts the implementation of the miscue tolerant finite state language models and the two baseline methods. After that it describes experiments which show that the miscue tolerant finite state language models solve the task of this thesis significantly better than the baseline methods. Finally the thesis concludes with a discussion of the results and future work.Automaattinen puheentunnistusjärjestelmä yhdistää akustista ja kielellistä tietoa, joten sen hakuavaruus on monitasoinen. Tämän hakuavaruuden voi esittää tehokkaasti äärellisillä tilamalleilla. Erityisesti painotetut äärelliset tilamuuttajat voivat esittää jokaista hakuavaruuden tasoa ja nämä muuttajat voidaan yhdistää yleisillä muuttaja-algoritmeilla. Kun tunnistetaan ääneen lukemista syötteestä, syöte rajaa hakuavaruutta hyvin. Ihmiset kuitenkin poikkeavat tekstistä hieman. Kutsun näitä lukupoikkeamiksi, koska ne ovat luonnollinen osa taitavaakin lukemista, eivätkä siis suoranaisesti lukuvirheitä. Tämän diplomityön tavoite on luoda järjestelmä, joka tunnistaa lukupuheäänitteitä tarkasti. Tätä varten toteutetaan lukupoikkeamia sietävä äärellisen tilan kielimalli, jota verrataan kahteen perinteiseen menetelmään, N-gram malleihin ja pakotettuun kohdistukseen. Lukupuheen tunnistustulosta käytetään, kun tarkastetaan, sopiiko äänite seuraaviin automaattisiin käsittelyvaiheisiin puhutussa vieraan kielen kokeessa. DigiTalaprojekti muotoilee puhuttua osiota vieraan kielen ylioppilaskokeisiin. Ylioppilaskokeiden sähköistäminen mahdollistaa tällaisten automaattisten menetelmien käytön. Kokeet sekä englanninkielisellä simuloidulla aineistolla että ruotsinkielisellä tosimaailman aineistolla osoittavat, että lukupoikkeamia sietävä äärellisen tilan kielimalli ratkaisee diplomityön ongelmanasettelun. Vaikealla tosimaailman aineistolla saadaan 3.77 ± 0.47 prosentuaalinen sanavirhemäärä

    Context-Dependent Acoustic Modelling for Speech Recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Eigentriphones for Context-Dependent Acoustic Modeling

    No full text
    Most automatic speech recognizers employ tied-state triphone hidden Markov models (HMM), in which the corresponding triphone states of the same base phone are tied. State tying is commonly performed with the use of a phonetic regression class tree which renders robust context-dependent modeling possible by carefully balancing the amount of training data with the degree of tying. However, tying inevitably introduces quantization error: triphones tied to the same state are not distinguishable in that state. Recently we proposed a new triphone modeling approach called eigentriphone modeling in which all triphone models are, in general, distinct. The idea is to create an eigenbasis for each base phone (or phone state) and all its triphones (or triphone states) are represented as distinct points in the space spanned by the basis. We have shown that triphone HMMs trained using model-based or state-based eigentriphones perform at least as well as conventional tied-state HMMs. In this paper, we further generalize the definition of eigentriphones over clusters of acoustic units. Our experiments on TIMIT phone recognition and the Wall Street Journal 5K-vocabulary continuous speech recognition show that eigentriphones estimated from state clusters defined by the nodes in the same phonetic regression class tree used in state tying result in further performance gain

    Eigentriphones: A basis for context-dependent acoustic modeling

    No full text
    In context-dependent acoustic modeling, it is important to strike a balance between detailed modeling and data sufficiency for robust estimation of model parameters. In the past, parameter sharing or tying is one of the most common techniques to solve the problem. In recent years, another technique which may be loosely and collectively called the subspace approach tries to express a phonetic or sub-phonetic unit in terms of a small set of canonical vectors or units. In this paper, we investigate the development of an eigenbasis over the triphones and model each triphone as a point in the basis. We call the eigenvectors in the basis eigentriphones. From another perspective, we investigate the use of the eigenvoice adaptation method as a general acoustic modeling method for training triphones - especially the less frequent triphones without tying their states so that all the triphones are really distinct from each other and thus may be more discriminative. Experimental evaluation on the 5K-vocabulary HUB2 recognition task shows that a triphone HMM system trained using only eigentriphones without state tying may achieve slightly better performance than the common tied-state triphones. © 2011 IEEE
    corecore