2 research outputs found

    Induction of Treebank-Aligned Lexical Resources

    No full text
    By ‘treebank-aligned lexical resources ’ we mean ones where there is a systematic correspondence between the lexical resource and treebank syntactic resources. For instance, the lexicon resource contains features representing the subcategorization frames of verbs, which correspond to structural configurations that the verb occurs in, in a treebank. Given such an alignment, a treebank can b

    Induction of Treebank-Aligned Lexical Resources

    No full text
    We describe the induction of lexical resources from unannotated corpora that are aligned with treebank grammars, providing a systematic correspondence between features in the lexical resource and a treebank syntactic resource. We first describe a methodology based on parsing technology for augmenting a treebank database with linguistic features. A PCFG containing these features is created from the augmented treebank. We then use a procedure based on the inside-outside algorithm to learn lexical resources aligned with the treebank PCFG from large unannotated corpora. The method has been applied in creating a feature-annotated English treebank based on the Penn Treebank. The unsupervised estimation procedure gives a substantial error reduction (up to 31.6%) on the task of learning the subcategorization preference of novel verbs that are not present in the annotated training sample. 1