Study on Distance Measures for Clustering of Web Documents based on DOM-Tree based Representation of Web Document Structure

, Manoj Kumar Sarma, Anjana Kakoti Mahanta

Study on Distance Measures for Clustering of Web Documents based on DOM-Tree based Representation of Web Document Structure

Authors: Manoj Kumar Sarma, Anjana Kakoti Mahanta
Publication date: 30 June 2017
Publisher: 'Auricle Technologies, Pvt., Ltd.'
Doi

Abstract

Among the three broad areas of Web mining, Web Structure Mining is the method of discovering structure information from either the web hyperlink structure or the web page structure. In order to apply data mining techniques on web pages, a good and efficient representation of web pages is required that could depict the actual hierarchical structure of web pages. The work presented here aims to find out an appropriate distance measure (also called as similarity measure) for strings that can be used for clustering of web documents and also for other data mining applications

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

International Journal on Recent and Innovation Trends in Computing and Communication

oai:ojs2.ijritcc.com:article/9...

Last time updated on 20/10/2022