5,929 research outputs found
Benchmarking Summarizability Processing in XML Warehouses with Complex Hierarchies
Business Intelligence plays an important role in decision making. Based on
data warehouses and Online Analytical Processing, a business intelligence tool
can be used to analyze complex data. Still, summarizability issues in data
warehouses cause ineffective analyses that may become critical problems to
businesses. To settle this issue, many researchers have studied and proposed
various solutions, both in relational and XML data warehouses. However, they
find difficulty in evaluating the performance of their proposals since the
available benchmarks lack complex hierarchies. In order to contribute to
summarizability analysis, this paper proposes an extension to the XML warehouse
benchmark (XWeB) with complex hierarchies. The benchmark enables us to generate
XML data warehouses with scalable complex hierarchies as well as
summarizability processing. We experimentally demonstrated that complex
hierarchies can definitely be included into a benchmark dataset, and that our
benchmark is able to compare two alternative approaches dealing with
summarizability issues.Comment: 15th International Workshop on Data Warehousing and OLAP (DOLAP
2012), Maui : United States (2012
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Dimensional enrichment of statistical linked open data
On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.Peer ReviewedPostprint (author's final draft
On-line analytical processing
On-line analytical processing (OLAP) describes an approach to decision support, which aims to extract knowledge from a data warehouse, or more specifically, from data marts. Its main idea is providing navigation through data to non-expert users, so that they are able to interactively generate ad hoc queries without the intervention of IT professionals. This name was introduced in contrast to on-line transactional processing (OLTP), so that it reflected the different requirements and characteristics between these classes of uses. The concept falls in the area of business intelligence.Peer ReviewedPostprint (author's final draft
- …