738 research outputs found
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
Pattern tree-based XOLAP rollup operator for XML complex hierarchies
With the rise of XML as a standard for representing business data, XML data
warehousing appears as a suitable solution for decision-support applications.
In this context, it is necessary to allow OLAP analyses on XML data cubes.
Thus, XQuery extensions are needed. To define a formal framework and allow
much-needed performance optimizations on analytical queries expressed in
XQuery, defining an algebra is desirable. However, XML-OLAP (XOLAP) algebras
from the literature still largely rely on the relational model. Hence, we
propose in this paper a rollup operator based on a pattern tree in order to
handle multidimensional XML data expressed within complex hierarchies
Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies
Business Intelligence plays an important role in decision making. Based on
data warehouses and Online Analytical Processing, a business intelligence tool
can be used to analyze complex data. Still, summarizability issues in data
warehouses cause ineffective analyses that may become critical problems to
businesses. To settle this issue, many researchers have studied and proposed
various solutions, both in relational and XML data warehouses. However, they
find difficulty in evaluating the performance of their proposals since the
available benchmarks lack complex hierarchies. In order to contribute to
summarizability analysis, this paper proposes an extension to the XML warehouse
benchmark (XWeB) with complex hierarchies. The benchmark enables us to generate
XML data warehouses with scalable complex hierarchies as well as
summarizability processing. We experimentally demonstrated that complex
hierarchies can definitely be included into a benchmark dataset, and that our
benchmark is able to compare two alternative approaches dealing with
summarizability issues.Comment: 15th International Workshop on Data Warehousing and OLAP (DOLAP
2012), Maui : United States (2012
Heterogeneous data source integration for smart grid ecosystems based on metadata mining
The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de EconomĂa y Competitividad TEC2013-40767-
Integrating data warehouses with web data : a survey
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML
technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper
reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces
the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for
XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of
information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover
the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research
line
Integration of Data Mining and Data Warehousing: a practical methodology
The ever growing repository of data in all fields poses new challenges to the modern analytical
systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has
to be modeled in form of a data warehouse schema. Schema generation process is complex manual
task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful
knowledge from large datasets
Data Mining-based Fragmentation of XML Data Warehouses
With the multiplication of XML data sources, many XML data warehouse models
have been proposed to handle data heterogeneity and complexity in a way
relational data warehouses fail to achieve. However, XML-native database
systems currently suffer from limited performances, both in terms of manageable
data volume and response time. Fragmentation helps address both these issues.
Derived horizontal fragmentation is typically used in relational data
warehouses and can definitely be adapted to the XML context. However, the
number of fragments produced by classical algorithms is difficult to control.
In this paper, we propose the use of a k-means-based fragmentation approach
that allows to master the number of fragments through its parameter. We
experimentally compare its efficiency to classical derived horizontal
fragmentation algorithms adapted to XML data warehouses and show its
superiority
- …