Document categorization, which is defined as the classification of text documents into one of several fixed classes or categories, has become important with the explosive growth of the World Wide Web. The goal of the work described here is to automatically categorize Web documents in order to enable effective retrieval of Web information. In this paper, based on the rule learning algorithm RIPPER (for Repeated Incremental Pruning to Produce Error Reduction), we propose an efficient method for hierarchical document categorization. 1 Introduction Recently, as the World Wide Web(WWW or Web) developed rapidly, a large collection of fulltext documents in electronic form is available and opportunity for getting a useful piece of information is increased. Also in the WWW it is quite common to have large, manually ordered collections of hypertext links (e.g. Yahoo) and it is effective to refer to the links. Text categorization is the classification of texts with respect to a set of categorie..
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.