Comparison of Language Models by Stochastic Context-Free Grammar, Bigram and Quasi-Simplified-Trigram

Murase, Isao; Nakagawa, Seiichi; Zhou, Min

research

Comparison of Language Models by Stochastic Context-Free Grammar, Bigram and Quasi-Simplified-Trigram

Authors: Isao Murase
Seiichi Nakagawa
Min Zhou
Publication date: 1 January 1991
Publisher: INSTITUTION FOR PHONETIC SCIENCES UNIVERSITY OF KYOTO

Abstract

In this paper, we investigate the language models by stochasic context-free grammar (SCFG), bigram and quasi-trigram. For calculating of statistics of bigram and quasi-trigram, we used the set of sentences generated randomly from CFG that are legal in terms of semantics. We compared them on the perplexities for their models and the sentence recognition accuracies. The sentence recognition was experimented in the "UNIX-QA" task with the vocabulary size of 521 words. From these results, the perplexities of bigram and quasi-trigram were about 1.6 times and 1.3 times larger than the perplexity of CFG that corresponds to the most restricted grammar (perplexity=10.0), and the perplexity of SCFG is only about 1/2 of CFG. We realized that quasi-trigram had the almost same ability of modeling as the most restricted CFG when the set of plausible sentences in the task was given

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Kyoto University Research Information Repository

oai:repository.kulib.kyoto-u.a...

Last time updated on 13/06/2016