Skip to main content
Article thumbnail
Location of Repository

Web Page Classification: A Soft Computing Approach

By Angela Ribeiro, Víctor Fresno, María C. Garcia-alegre, Domingo Guinea and Juan Carlos

Abstract

Abstract. The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has generated a poorly organized environment that hinders the sharing and mining of useful data. The need for meaningful web-page classification techniques is therefore becoming an urgent issue. This paper describes a novel approach to web-page classification based on a fuzzy representation of web pages. A doublet representation that associates a weight with each of the most representative words of the web document so as to characterize its relevance in the document. This weight is derived by taking advantage of the characteristics of HTML language. Then a fuzzy-rule-based classifier is generated from a supervised learning process that uses a genetic algorithm to search for the minimum fuzzy-rule set that best covers the training examples. The proposed system has been demonstrated with two significantly different classes of web pages.

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.6542
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://gavab.escet.urjc.es/art... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.