Search CORE

4,260 research outputs found

Caching in Multidimensional Databases

Author: Szépkúti István
Publication venue: 'Periodica Polytechnica Budapest University of Technology and Economics'
Publication date: 01/01/2006
Field of study

One utilisation of multidimensional databases is the field of On-line Analytical Processing (OLAP). The applications in this area are designed to make the analysis of shared multidimensional information fast [9]. On one hand, speed can be achieved by specially devised data structures and algorithms. On the other hand, the analytical process is cyclic. In other words, the user of the OLAP application runs his or her queries one after the other. The output of the last query may be there (at least partly) in one of the previous results. Therefore caching also plays an important role in the operation of these systems. However, caching itself may not be enough to ensure acceptable performance. Size does matter: The more memory is available, the more we gain by loading and keeping information in there. Oftentimes, the cache size is fixed. This limits the performance of the multidimensional database, as well, unless we compress the data in order to move a greater proportion of them into the memory. Caching combined with proper compression methods promise further performance improvements. In this paper, we investigate how caching influences the speed of OLAP systems. Different physical representations (multidimensional and table) are evaluated. For the thorough comparison, models are proposed. We draw conclusions based on these models, and the conclusions are verified with empirical data.Comment: 14 pages, 5 figures, 8 tables. Paper presented at the Fifth Conference of PhD Students in Computer Science, Szeged, Hungary, 27 - 30 June 2006. For further details, please refer to http://www.inf.u-szeged.hu/~szepkuti/papers.html#cachin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Szeged

Periodica Polytechnica (Budapest University of Technology and Economics)

CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries

Author: Fu Lixin
NC DOCKS at The University of North Carolina at Greensboro
Publication venue
Publication date: 01/01/2000
Field of study

Being able to efficiently answer arbitrary OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes has been a continued, major concern in data warehousing. In this paper, we introduce a new data structure, called Statistics Tree (ST), together with an efficient algorithm called CubiST, for evaluating ad-hoc OLAP queries on top of a relational data warehouse. We are focusing on a class of queries called cube queries, which generalize the data cube operator. CubiST represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the familiar view lattice to compute and materialize new views from existing views in some heuristic fashion. CubiST is the first OLAP algorithm that needs only one scan over the detailed data set and can efficiently answer any cube query without additional I/O when the ST fits into memory. We have implemented CubiST and our experiments have demonstrated significant improvements in performance and scalability over existing ROLAP/MOLAP approaches

The University of North Carolina at Greensboro

Attribute Value Reordering For Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP

arXiv.org e-Print Archive

CiteSeerX

R-libre

Archipel - Université du Québec à Montréal