4,260 research outputs found
Caching in Multidimensional Databases
One utilisation of multidimensional databases is the field of On-line
Analytical Processing (OLAP). The applications in this area are designed to
make the analysis of shared multidimensional information fast [9]. On one hand,
speed can be achieved by specially devised data structures and algorithms. On
the other hand, the analytical process is cyclic. In other words, the user of
the OLAP application runs his or her queries one after the other. The output of
the last query may be there (at least partly) in one of the previous results.
Therefore caching also plays an important role in the operation of these
systems. However, caching itself may not be enough to ensure acceptable
performance. Size does matter: The more memory is available, the more we gain
by loading and keeping information in there. Oftentimes, the cache size is
fixed. This limits the performance of the multidimensional database, as well,
unless we compress the data in order to move a greater proportion of them into
the memory. Caching combined with proper compression methods promise further
performance improvements. In this paper, we investigate how caching influences
the speed of OLAP systems. Different physical representations (multidimensional
and table) are evaluated. For the thorough comparison, models are proposed. We
draw conclusions based on these models, and the conclusions are verified with
empirical data.Comment: 14 pages, 5 figures, 8 tables. Paper presented at the Fifth
Conference of PhD Students in Computer Science, Szeged, Hungary, 27 - 30 June
2006. For further details, please refer to
http://www.inf.u-szeged.hu/~szepkuti/papers.html#cachin
CubiST: A New Algorithm for Improving the Performance of Ad-hoc OLAP Queries
Being able to efficiently answer arbitrary OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes has been a continued, major concern in data warehousing. In this paper, we introduce a new data structure, called Statistics Tree (ST), together with an efficient algorithm called CubiST, for evaluating ad-hoc OLAP queries on top of a relational data warehouse. We are focusing on a class of queries called cube queries, which generalize the data cube operator. CubiST represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the familiar view lattice to compute and materialize new views from existing views in some heuristic fashion. CubiST is the first OLAP algorithm that needs only one scan over the detailed data set and can efficiently answer any cube query without additional I/O when the ST fits into memory. We have implemented CubiST and our experiments have demonstrated significant improvements in performance and scalability over existing ROLAP/MOLAP approaches
Attribute Value Reordering For Efficient Hybrid OLAP
The normalization of a data cube is the ordering of the attribute values. For
large multidimensional arrays where dense and sparse chunks are stored
differently, proper normalization can lead to improved storage efficiency. We
show that it is NP-hard to compute an optimal normalization even for 1x3
chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are
nearly statistically independent, we show that dimension-wise attribute
frequency sorting is an optimal normalization and takes time O(d n log(n)) for
data cubes of size n^d. When dimensions are not independent, we propose and
evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is
already 19%-30% more efficient than ROLAP, but normalization can improve it
further by 9%-13% for a total gain of 29%-44% over ROLAP
- …