Exploiting multi-word units in statistical parsing and generation

Cafferkey, Conor

thesis

Exploiting multi-word units in statistical parsing and generation

Authors: Conor Cafferkey
Publication date: 1 November 2008
Publisher: Dublin City University. School of Computing

Abstract

Syntactic parsing is an important prerequisite for many natural language processing (NLP) applications. The task refers to the process of generating the tree of syntactic nodes with associated phrase category labels corresponding to a sentence. Our objective is to improve upon statistical models for syntactic parsing by leveraging multi-word units (MWUs) such as named entities and other classes of multi-word expressions. Multi-word units are phrases that are lexically, syntactically and/or semantically idiosyncratic in that they are to at least some degree non-compositional. If such units are identified prior to, or as part of, the parsing process their boundaries can be exploited as islands of certainty within the very large (and often highly ambiguous) search space. Luckily, certain types of MWUs can be readily identified in an automatic fashion (using a variety of techniques) to a near-human level of accuracy. We carry out a number of experiments which integrate knowledge about different classes of MWUs in several commonly deployed parsing architectures. In a supplementary set of experiments, we attempt to exploit these units in the converse operation to statistical parsing---statistical generation (in our case, surface realisation from Lexical-Functional Grammar f-structures). We show that, by exploiting knowledge about MWUs, certain classes of parsing and generation decisions are more accurately resolved. This translates to improvements in overall parsing and generation results which, although modest, are demonstrably significant

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

DCU Online Research Access Service

oai:doras.dcu.ie:615

Last time updated on 10/07/2013

Name not available

oai:doras.dcu.ie:615

Last time updated on 09/02/2018