We introduce CST (cross-document structure theory), a paradigm for multidocument analysis. CST takes into account the rhetorical structure of clusters of related textual documents. We present a taxonomy of cross-document relationships. We argue that CST can be the basis for multidocument summarization guided by user preferences for summary length, information provenance, cross-source agreement, and chronological ordering of facts. 1 Introduction The Topic Detection and Tracking model (TDT) [Allan et al. 98] describes news events as they are reflected in news sources. First, many sources write on the same event and, second, the same source typically produces a number of accounts of the event over a period of time. Sixteen news stories related to the same event from six news sources over a two-hour time period are represented in Figure 1. 06:30 06:45 07:00 07:15 07:30 07:45 08:00 08:15 08:30 Figure 1 : Time distribution of related documents from multiple sources A careful..
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.