Large scale annotated corpora have played a critical role in speech and natural language research. However, while existing annotated corpora such as the Penn Treebank have been highly successful at the sentence-level, we also need large-scale annotated resources that reliably encode key aspects of discourse. In this paper, we detail (1) our plans for building the Penn Discourse Treebank (PDTB), (2) our preliminary annotation work, and (3) the results to date of our efforts. Annotation in the PDTB will focus on coherence relations associated with discourse connectives, including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure and supporting the extraction of a range of inferences associated with discourse connectives.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.