Vision-Based Deep Web Data Extraction For Web Document Clustering

Dr. M. Lavanya; Dr.M.Usha Rani

Vision-Based Deep Web Data Extraction For Web Document Clustering

Authors: Dr. M. Lavanya
Dr.M.Usha Rani
Publication date: 15 March 2012
Publisher: Global Journals Inc. (US)

Abstract

The design of web information extraction systems becomes more complex and time-consuming. Detection of data region is a significant problem for information extraction from the web page. In this paper, an approach to vision-based deep web data extraction is proposed for web document clustering. The proposed approach comprises of two phases: 1) Vision-based web data extraction, and 2) web document clustering. In phase 1, the web page information is segmented into various chunks. From which, surplus noise and duplicate chunks are removed using three parameters, such as hyperlink percentage, noise score and cosine similarity. Finally, the extracted keywords are subjected to web document clustering using Fuzzy c-means clustering (FCM)

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Global Journal of Computer Science and Technology (GJCST)

oai:ojs2.computerresearch.org:...

Last time updated on 19/10/2022