2 research outputs found

    Statistical Feature Language Model

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceStatistical language models are widely used in automatic speech recognition in order to constrain the decoding of a sentence. Most of these models derive from the classical n-gram paradigm. However, the production of a word dends on a large set of linguistic features : lexical, syntactic, semantic, etc. Moreover, in some natural languages the gender and number of the left context affect the production of the next word. Therefore, it seems attractive to design a language model based on a variety of word features. We present in this paper a new statistical language model, called Statistical Feature Language Model, SFLM, based on this idea. In SFLM a word is considered as an array of linguistic features, and the model is defined in a way similar to the n-gram model. Experiments carried out for French and show an improvement in terms of perplexity and predicted words

    An alternative scheme for perplexity estimation and its assessment for the evaluation of language models

    No full text
    Article dans revue scientifique avec comité de lecture. internationale.International audienceLanguage models are usually evaluated on test texts using the perplexity derived from the likelihood function computed on these texts (test set perplexity). In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for estimating the test set perplexity. The method is derived from the Shannon game and based on a gambling approach on the next word to come in a truncated sentence. We also study the entropy bounds proposed by Shannon and based on the rank of the correct answer, in order to estimate a perplexity interval for non-probabilistic language models. The relevance of the approach is validated on an example. We then report the results of a preliminary comparative evaluation using the proposed schem
    corecore