Abstract This paper explores new-information detection, describing a strategy for filter-ing a stream of documents to present only information that is fresh. We focus on multi-document summarization and seek to efficiently use more linguistic informa-tion than is often seen in such systems. We experimented with our linguistic system and with a more traditional sentence-based, vector-space system and found that acombination of the two approaches boosted performance over each one alone. 1 Introduction The voluminous amount of information now in digital form poses an important chal-lenge- to distinguish new material from material in previously seen documents. The stream of news from around the world on the World Wide Web is but one form of thisdeluge of data. Data from the world financial markets, government actions, court decisions, scientific research can all be tapped, but that value will be greatly diminished ifreaders must sift through the same material over and over again
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.