Efficient Disambiguation by means of Stochastic Tree Substitution Grammars

Abstract

In Stochastic Tree Substitution Grammars (STSGs), one parse(-tree) of an input sentence can be generated by exponentially many derivations; the probability of a parse is defined as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for finding the most probable parse (MPP) of an input sentence, are not applicable to STSGs. In this paper we study parsing with STSGs and concentrate on the problem of disambiguation. We present polynomial algorithms for computing both the probability of a parse and the probability of an input sentence and its most probable derivation. In addition, we present an optimization technique of search algorithms for the MPP. Keywords: Corpus-based NLP, Statistical NLP, Disambiguation. Motivation Natural language (NL) grammars often assign many syntactic structures to the same sentence. Most of these structures are perceived as implausible by a human language user. At..

    Similar works