Skip to main content
Article thumbnail
Location of Repository

Rule-Based Text Categorization Using Hierarchical Categories

By Minoru Sasaki and Kenji Kita


Document categorization, which is defined as the classification of text documents into one of several fixed classes or categories, has become important with the explosive growth of the World Wide Web. The goal of the work described here is to automatically categorize Web documents in order to enable effective retrieval of Web information. In this paper, based on the rule learning algorithm RIPPER (for Repeated Incremental Pruning to Produce Error Reduction), we propose an efficient method for hierarchical document categorization. 1 Introduction Recently, as the World Wide Web(WWW or Web) developed rapidly, a large collection of fulltext documents in electronic form is available and opportunity for getting a useful piece of information is increased. Also in the WWW it is quite common to have large, manually ordered collections of hypertext links (e.g. Yahoo) and it is effective to refer to the links. Text categorization is the classification of texts with respect to a set of categorie..

Year: 1998
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.