Automatic acquisition of Spanish LFG resources from the Cast3LB treebank

Cahill, Aoife; O'Donovan, Ruth; van Genabith, Josef; Way, Andy

research

Automatic acquisition of Spanish LFG resources from the Cast3LB treebank

Authors: Aoife Cahill
Ruth O'Donovan
Josef van Genabith
Andy Way
Publication date: 1 January 2005
Publisher: CSLI Publications

Abstract

In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Name not available

oai:doras.dcu.ie:16174

Last time updated on 09/02/2018

DCU Online Research Access Service

oai:doras.dcu.ie:16174

Last time updated on 10/07/2013

Irish Universities

Last time updated on 30/12/2017