Skip to main content
Article thumbnail
Location of Repository

Text Classification Combining Clustering and Hierarchical Approaches

By Shankar Ranganathan

Abstract

The Internet presents a vast resource of information that continues to grow exponentially. Most of the present day search engines aid in locating relevant documents based on keyword matches. However, to provide the user with more relevant information, we need a system that also incorporates the conceptual framework of the queries. This is the goal of KeyConcept, a search engine that retrieves documents based on a combination of keyword and conceptual matching. An automatic classifier is used to determine the concepts to which new documents belong. Currently, the classifier is trained by selecting documents randomly from each concept’s training set and it also ignores the hierarchical structure of the concept tree. In this thesis, we present a novel approach to select these training documents by using document clustering within the concepts. We also exploit hierarchical structure in which the concepts themselves are arranged. Combining these approaches to text classification, we achieve an improvement of 67 % in accuracy over the existing system

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.7298
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.ittc.ku.edu/researc... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.