Skip to main content
Article thumbnail
Location of Repository

Penn discourse treebank: Building a large scale annotated corpus encoding dltag-based discourse structure and discourse relations

By Cassandre Creswell, Katherine Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi and Bonnie Webber


Large scale annotated corpora have played a critical role in speech and natural language research. However, while existing annotated corpora such as the Penn Treebank have been highly successful at the sentence-level, we also need large-scale annotated resources that reliably encode key aspects of discourse. In this paper, we detail (1) our plans for building the Penn Discourse Treebank (PDTB), (2) our preliminary annotation work, and (3) the results to date of our efforts. Annotation in the PDTB will focus on coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives.

Publisher: 2012-05-30
Year: 2003
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.