research

Comparison of Language Models by Stochastic Context-Free Grammar, Bigram and Quasi-Simplified-Trigram

Abstract

In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram and quasi-trigram. For calculating of statistics of bigram and quasi-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.6 times and 1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and the perplexity of SCFG is only about 1/2 of CFG. We realized that quasi-trigram had the almost same ability of modeling as the most restricted CFG when the set of plausible sentences in the task was given

    Similar works