A Comparison of Similarity Measures for Text Documents

Abstract

Similarity is an important and widely used concept in many applications such as Document Summarisation, Question Answering, Information Retrieval, Document Clustering and Categorisation. This paper presents a comparison of various similarity measures in comparing the content of text documents. We have attempted to find the best measure suited for finding the document similarity for newspaper reports.Stop words, stemming, normalisation, similarity measure, discriminant

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 14/01/2014