Similarity measures for tracking information flow

Abstract

Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity - resulting from summarization, paraphrasing, copying, and stronger forms of topical relevance - are useful for applications such as information low analysis and question-answering tasks. In this paper, we explore mechanisms for measuring such intermediate kinds of similarity, focusing on the task of identifying where a particular piece of information originated. We consider both sentence-to-sentence and document-to-document comparison, and have incorporated these algorithms into RECAP, a prototype information low analysis tool. Our experimental results with RECAP indicate that new mechanisms such as those we propose are likely to be more appropriate than existing methods for identifying the intermediate forms of similarity

Similar works

Full text

thumbnail-image

Research Repository RMIT University

redirect
Last time updated on 04/09/2013

This paper was published in Research Repository RMIT University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.