Search CORE

7,945 research outputs found

Building XML data warehouse based on frequent patterns in user queries

Author: Bruckner Robert
Ling Tok Wang
Tjoa A. Min
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources

University of Southern Queensland ePrints

Integration of Data Mining and Data Warehousing: a practical methodology

Author: Pears R
Usman M
Publication venue: Advanced Institute of Convergence IT (AICIT)
Publication date: 12/08/2011
Field of study

The ever growing repository of data in all fields poses new challenges to the modern analytical systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has to be modeled in form of a data warehouse schema. Schema generation process is complex manual task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful knowledge from large datasets

AUT Scholarly Commons

Attribute oriented induction with star schema

Author: H Spits Warnars H. L.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 29/05/2010
Field of study

This paper will propose a novel star schema attribute induction as a new attribute induction paradigm and as improving from current attribute oriented induction. A novel star schema attribute induction will be examined with current attribute oriented induction based on characteristic rule and using non rule based concept hierarchy by implementing both of approaches. In novel star schema attribute induction some improvements have been implemented like elimination threshold number as maximum tuples control for generalization result, there is no ANY as the most general concept, replacement the role concept hierarchy with concept tree, simplification for the generalization strategy steps and elimination attribute oriented induction algorithm. Novel star schema attribute induction is more powerful than the current attribute oriented induction since can produce small number final generalization tuples and there is no ANY in the results.Comment: 23 Pages, IJDM

arXiv.org e-Print Archive

CiteSeerX

Crossref

Extending Uml for Multidimensional Modeling in Data Warehouse

Author: Dhawan Bakul
Gosain Anjana
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 28/08/2020
Field of study

Multidimensional modeling is the foundation of data warehouses, MD databases, and On-Line Analytical Processing (OLAP) applications. Nowadays Dimensional modeling and object-orientation are becoming growing interest areas. In the past few years; there have been many proposals, for representing the MD properties at the conceptual level. However, none of them has been accepted as a standard for conceptual MD modeling. In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile for multidimensional databases. This profile is composed of a set of stereotypes, constraints and tagged values. We have extended the uml for representing the main multidimensional properties at the conceptual level such as the many-to-many relationships between facts and dimensions, degenerate dimensions, multiple and alternative path classification hierarchies, and nonstrict and complete hierarchies and aggregate fact table

Interscience Research Network

SOLAM: A Novel Approach of Spatial Aggregation in SOLAP Systems

Author: Djamila Hamdadou
Zemri Farah Amina
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 17/03/2022
Field of study

In the context of a data driven approach aimed to detect the real and responsible factors of the transmission of diseases and explaining its emergence or re-emergence, we suggest SOLAM (Spatial on Line Analytical Mining) system, an extension of Spatial On Line Analytical Processing (SOLAP) with Spatial Data Mining (SDM) techniques. Our approach consists of integrating EPISOLAP system, tailored for epidemiological surveillance, with spatial generalization method allowing the predictive evaluation of health risk in the presence of hazards and awareness of the vulnerability of the exposed population. The proposed architecture is a single integrated decision-making platform of knowledge discovery from spatial databases. Spatial generalization methods allow exploring the data at different semantic and spatial scales while reducing the unnecessary dimensions. The principle of the method is selecting and deleting attributes of low importance in data characterization, thus produces zones of homogeneous characteristics that will be merged

Re-UNIR