The Internet presents a vast resource of information that continues to grow exponentially. Most of the present day search engines aid in locating relevant documents based on keyword matches. However, to provide the user with more relevant information, we need a system that also incorporates the conceptual framework of the queries. This is the goal of KeyConcept, a search engine that retrieves documents based on a combination of keyword and conceptual matching. An automatic classifier is used to determine the concepts to which new documents belong. Currently, the classifier is trained by selecting documents randomly from each concept’s training set and it also ignores the hierarchical structure of the concept tree. In this thesis, we present a novel approach to select these training documents by using document clustering within the concepts. We also exploit hierarchical structure in which the concepts themselves are arranged. Combining these approaches to text classification, we achieve an improvement of 67 % in accuracy over the existing system
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.