research

Acquisition et évaluation sur corpus de propriétés de sous-catégorisation syntaxique

Abstract

We carry out an experiment aimed at using subcategorization information into a syntactic parser for PP attachment disambiguation. The subcategorization lexicon consists of probabilities between a word (verb, noun, adjective) and a preposition. The lexicon is acquired automatically from a 200 million word corpus, that is partially tagged and parsed. In order to assess the lexicon, we use 4 different corpora in terms of genre and domain. We D. Bourigault, C. Frérot assess various methods for PP attachment disambiguation : an exogeous method relies on the sub-categorization lexicon whereas an endogenous method relies on the corpus specific ressource only and an hybrid method makes use of both. The hybrid method proves to be the best and the results vary from 79.4 % to 87.2 %

    Similar works