A Comparison of Similarity Measures for Text Documents
- Publication date
- Publisher
Abstract
Similarity is an important and widely used concept in many applications such as Document Summarisation, Question Answering, Information Retrieval, Document Clustering and Categorisation. This paper presents a comparison of various similarity measures in comparing the content of text documents. We have attempted to find the best measure suited for finding the document similarity for newspaper reports.Stop words, stemming, normalisation, similarity measure, discriminant