thesis

Automatic extraction of large-scale multilingual lexical resources

Abstract

In this thesis, I present a methodology for treebank- or parser-based acquisition of lexical resources, in particular sub categorisation frames. The method uses an automatic Lexical Functional Grammar (LFG) f-structure annotation algorithm (Cahill et al., 2002a, 2004a; Burke et al., 2004b) and has been applied to the Penn-II and Penn-III treebanks (Marcus et al., 1994) with a total of about 1.3 million words as well as to (a subset of) the British National Corpus (Bernard, 2002) with about 90 million words. I extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG category-based subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for subcategorised particles. The approach distinguishes between active and passive frames, and reflects the effects of long-distance dependencies (LDDs) in the source d ata structures. Frames are associated with conditional probabilities, facilitating the optimisation of the extracted lexicon for quality or coverage through filtering. In contrast to many other approaches, subcategorisation frame types are not predefined but acquired from the source data. I carried out large-scale evaluations of the complete set of forms extracted against the COMLEX and OALD resources. To my knowledge, this is the largest and most complete evaluation of subcategorisation frames for English. The parser-based system is also evaluated against Korhonen (2002) with a statistically significant improvement over the previous best score. The automatic annotation methodology, as well as the grammar and lexicon extraction techniques for English have been successfully migrated to Spanish, German and Chinese treebanks despite typological differences and variations in treebank encoding. I believe that this approach provides an attractive and efficient multilingual grammar and lexicon development paradigm

    Similar works