Abstract. We explain how we extracted a PCFG (probabilistic contextfree grammar) from the Paris VII treebank. First we transform the syntactic trees of the corpus in derivation trees. The transformation is done with a generalized tree transducer, a variation from the usual top-down tree transducers, and gives as result some derivation trees for an AB grammar, which is a subset of a Lambek grammar, containing only the left and right elimination rules. We then have to extract a PCFG from the derivation tree. For this, we assume that the derivation trees are representative of the grammar. The extracted grammar is used, through a slightly modified CYK algorithm that takes in account the probabilities, for sentences analysis. It enables us to know if a sentence is include in the language described by the grammar.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.