7,945 research outputs found
Building XML data warehouse based on frequent patterns in user queries
[Abstract]: With the proliferation of XML-based data sources available across the Internet, it is increasingly important to provide users with a data warehouse of XML data sources to facilitate decision-making processes. Due to the extremely large amount of XML data available on web, unguided warehousing of XML data turns out to be highly costly and usually cannot well accommodate the users’ needs in XML data acquirement. In this paper, we propose an approach to materialize XML data warehouses based on frequent query patterns discovered from historical queries issued by users. The schemas of integrated XML documents in the warehouse are built using these frequent query patterns represented as Frequent Query Pattern Trees (FreqQPTs). Using hierarchical clustering technique, the integration approach in the data warehouse is flexible with respect to obtaining and maintaining XML documents. Experiments show that the overall processing of the same queries issued against the global schema become much efficient by using the XML data warehouse built than by directly searching the multiple data sources
Integration of Data Mining and Data Warehousing: a practical methodology
The ever growing repository of data in all fields poses new challenges to the modern analytical
systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has
to be modeled in form of a data warehouse schema. Schema generation process is complex manual
task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful
knowledge from large datasets
Attribute oriented induction with star schema
This paper will propose a novel star schema attribute induction as a new
attribute induction paradigm and as improving from current attribute oriented
induction. A novel star schema attribute induction will be examined with
current attribute oriented induction based on characteristic rule and using non
rule based concept hierarchy by implementing both of approaches. In novel star
schema attribute induction some improvements have been implemented like
elimination threshold number as maximum tuples control for generalization
result, there is no ANY as the most general concept, replacement the role
concept hierarchy with concept tree, simplification for the generalization
strategy steps and elimination attribute oriented induction algorithm. Novel
star schema attribute induction is more powerful than the current attribute
oriented induction since can produce small number final generalization tuples
and there is no ANY in the results.Comment: 23 Pages, IJDM
Extending Uml for Multidimensional Modeling in Data Warehouse
Multidimensional modeling is the foundation of data warehouses, MD databases, and On-Line Analytical Processing (OLAP) applications. Nowadays Dimensional modeling and object-orientation are becoming growing interest areas. In the past few years; there have been many proposals, for representing the MD properties at the conceptual level. However, none of them has been accepted as a standard for conceptual MD modeling. In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile for multidimensional databases. This profile is composed of a set of stereotypes, constraints and tagged values. We have extended the uml for representing the main multidimensional properties at the conceptual level such as the many-to-many relationships between facts and dimensions, degenerate dimensions, multiple and alternative path classification hierarchies, and nonstrict and complete hierarchies and aggregate fact table
SOLAM: A Novel Approach of Spatial Aggregation in SOLAP Systems
In the context of a data driven approach aimed to detect the real and responsible factors of the transmission of diseases and explaining its emergence or re-emergence, we suggest SOLAM (Spatial on Line Analytical Mining) system, an extension of Spatial On Line Analytical Processing (SOLAP) with Spatial Data Mining (SDM) techniques. Our approach consists of integrating EPISOLAP system, tailored for epidemiological surveillance, with spatial generalization method allowing the predictive evaluation of health risk in the presence of hazards and awareness of the vulnerability of the exposed population. The proposed architecture is a single integrated decision-making platform of knowledge discovery from spatial databases. Spatial generalization methods allow exploring the data at different semantic and spatial scales while reducing the unnecessary dimensions. The principle of the method is selecting and deleting attributes of low importance in data characterization, thus produces zones of homogeneous characteristics that will be merged
- …