Search CORE

11,590 research outputs found

Building XML data warehouse based on frequent patterns in user queries

Author: Bruckner Robert
Ling Tok Wang
Tjoa A. Min
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources

University of Southern Queensland ePrints

Ant colony optimization based clustering for data partitioning.

Author
Publication venue
Publication date: 01/01/2005
Field of study

Woo Kwan Ho.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references (leaves 148-155).Abstracts in English and Chinese.Contents --- p.iiAbstract --- p.ivAcknowledgements --- p.viiList of Figures --- p.viiiList of Tables --- p.xChapter Chapter 1 --- Introduction --- p.1Chapter Chapter 2 --- Literature Reviews --- p.7Chapter 2.1 --- Block Clustering --- p.7Chapter 2.2 --- Clustering XML by structure --- p.10Chapter 2.2.1 --- Definition of XML schematic information --- p.10Chapter 2.2.2 --- Identification of XML schematic information --- p.12Chapter Chapter 3 --- Bi-Tour Ant Colony Optimization for diagonal clustering --- p.15Chapter 3.1 --- Motivation --- p.15Chapter 3.2 --- Framework of Bi-Tour Ant Colony Algorithm --- p.21Chapter 3.3 --- Re-order of the data matrix in BTACO clustering method --- p.27Chapter 3.3.1 --- Review of Ant Colony Optimization --- p.29Chapter 3.3.2 --- Bi-Tour Ant Colony Optimization --- p.36Chapter 3.4 --- Determination of partitioning scheme --- p.44Chapter 3.4.1 --- Weighed Sum of Error (WSE) --- p.48Chapter 3.4.2 --- Materialization of partitioning scheme via hypothetic matrix --- p.50Chapter 3.4.3 --- Search of best-fit hypothetic matrix --- p.52Chapter 3.4.4 --- Dynamic programming approach --- p.53Chapter 3.4.5 --- Heuristic partitioning approach --- p.57Chapter 3.5 --- Experimental Study --- p.62Chapter 3.5.1 --- Data set --- p.63Chapter 3.5.2 --- Study on DP Approach and HP Approach --- p.65Chapter 3.5.3 --- Study on parameter settings --- p.69Chapter 3.5.4 --- Comparison with GA-based & hierarchical clustering methods --- p.81Chapter 3.6 --- Chapter conclusion --- p.90Chapter Chapter 4 --- Application of BTACO-based clustering in XML database system --- p.93Chapter 4.1 --- Introduction --- p.93Chapter 4.2 --- Overview of normalization and vertical partitioning in relational DB design --- p.95Chapter 4.2.1 --- Normalization of relational models in database design --- p.95Chapter 4.2.2 --- Vertical partitioning in database design --- p.98Chapter 4.3 --- Clustering XML documents --- p.100Chapter 4.4 --- Proposed approach using BTACO-based clustering --- p.103Chapter 4.4.1 --- Clustering XML documents by structure --- p.103Chapter 4.4.2 --- Clustering XML documents by user transaction patterns --- p.109Chapter 4.4.3 --- Implementation of Query Manager for our experimental study --- p.114Chapter 4.5 --- Experimental Study --- p.118Chapter 4.5.1 --- Experimental Study on the clustering by structure --- p.118Chapter 4.5.2 --- Experimental Study on the clustering by user access patterns --- p.133Chapter 4.6 --- Chapter conclusion --- p.141Chapter Chapter 5 --- Conclusions --- p.143Chapter 5.1 --- Contributions --- p.144Chapter 5.2 --- Future works --- p.146Bibliography --- p.148Appendix I --- p.156Appendix II --- p.168Index tables for Profile A --- p.168Index tables for Profile B --- p.171Appendix III --- p.17

CUHK Digital Repository

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

Datamining for Web-Enabled Electronic Business Applications

Author: Nayak Richi
Publication venue: Idea Group
Publication date: 01/01/2003
Field of study

Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

Queensland University of Technology ePrints Archive

XML documents clustering using a tensor space model

Author: Kutty Sangeetha
Li Yuefeng
Nayak Richi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information

CiteSeerX

Queensland University of Technology ePrints Archive

A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

Author: Nayak Richi
Tran Tien
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2007
Field of study

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

CiteSeerX

Queensland University of Technology ePrints Archive

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY