Clustering XML Document Based On Path Similarities Using Structure Only

Mon, Ei Ei; Tun, Khin Nwe Ni

Clustering XML Document Based On Path Similarities Using Structure Only

Authors: Ei Ei Mon
Khin Nwe Ni Tun
Publication date: 30 December 2009
Publisher

Abstract

We propose a methodology for clustering XMLdocuments on the basis of their structuralsimilarities. This research combines the methods ofcommon XPath and K-means clustering that improvethe efficiency for those XML documents with manydifferent structures. The common XPath is used forsearching similarities between huge numbers of XMLdocuments’ paths. K-means clustering algorithm isessentially used to accurate clusters. In order tocluster the documents’ paths we indicate the steps bystep methods. The first step includes frequentstructure mining for searching similarities betweenthe huge amounts of XML documents’ structures byusing the F-P growth method. The second step buildsdimensional feature vector matrix by using extractedpaths. Based on the set of common path vectorscollected, we compute the structure similaritybetween the XML documents. And the last steputilizes the K-means clustering algorithm is used tocreate accurate clusters which are based on the ideaof using path based clustering, which groups thedocuments according to their common XPaths, i.e.their frequent structures. The quality of clusteringcan be measured on the dissimilarity of documentstructures. Also, experimental evaluation performedon both synthetic and real data shows theeffectiveness of our approach

Similar works

Full text

Available Versions

MERAL Portal

oai:meral.edu.mm:recid/4226

Last time updated on 04/11/2020