Data-oriented parsing and the Penn Chinese treebank

Hearne, Mary; Way, Andy

research

Data-oriented parsing and the Penn Chinese treebank

Authors: Mary Hearne
Andy Way
Publication date: 1 January 2004
Publisher

Abstract

We present an investigation into parsing the Penn Chinese Treebank using a Data-Oriented Parsing (DOP) approach. DOP comprises an experience-based approach to natural language parsing. Most published research in the DOP framework uses PStrees as its representation schema. Drawbacks of the DOP approach centre around issues of efficiency. We incorporate recent advances in DOP parsing techniques into a novel DOP parser which generates a compact representation of all subtrees which can be derived from any full parse tree. We compare our work to previous work on parsing the Penn Chinese Treebank, and provide both a quantitative and qualitative evaluation. While our results in terms of Precision and Recall are slightly below those published in related research, our approach requires no manual encoding of head rules, nor is a development phase per se necessary. We also note that certain constructions which were problematic in this previous work can be handled correctly by our DOP parser. Finally, we observe that the ‘DOP Hypothesis’ is confirmed for parsing the Penn Chinese Treebank

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Irish Universities

Last time updated on 30/12/2017

Name not available

oai:doras.dcu.ie:15823

Last time updated on 09/02/2018

DCU Online Research Access Service

oai:doras.dcu.ie:15823

Last time updated on 10/07/2013