Abstract
- Publication date
- Publisher
Abstract
Encoding a document in a vector is a very crucial step for any vector space model based IR (Information Retrieval) system. It is obvious that the better these vectors are constructed, the better the performance of any application built on top of it. In traditional document representation methods, a document is considered as a bag of words. The fact that the words may be semantically related- a crucial information for document representation- is not taken into account. The feature vector representing the document is constructed from the frequency count of document terms. In this paper we describe a new method for generating feature vectors, using the semantic relations between the words in a sentence. The semantic relations are captured by the Universal Networking Language (UNL) which is a recently proposed semantic representation for sentences. In order to show that the generated document vectors with this new method are better than the traditional methods, we use the concept of mutual information. We prove by experiments that the vectors generated by UNL method indeed provide more information about the documents. It is proved that this helps in improving precision-recall in an IR system built using them.