Skip to main content
Article thumbnail
Location of Repository

A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure

By Dragomir Radev and We Introduce Cst (cross-document

Abstract

We introduce CST (cross-document structure theory), a paradigm for multidocument analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multidocument summarization guided by user preferences for summary length, information provenance, cross-source agreement, and chronological ordering of facts. 1 Introduction The Topic Detection and Tracking model (TDT) [Allan et al. 98] describes news events as they are reflected in news sources. First, many sources write on the same event and, second, the same source typically produces a number of accounts of the event over a period of time. Sixteen news stories related to the same event from six news sources over a two-hour time period are represented in Figure 1. 06:30 06:45 07:00 07:15 07:30 07:45 08:00 08:15 08:30 Figure 1 : Time distribution of related documents from multiple sources A careful..

Year: 2000
OAI identifier: oai:CiteSeerX.psu:10.1.1.32.175
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.columbia.edu/~ra... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.