6 research outputs found
Attribute Value Reordering For Efficient Hybrid OLAP
The normalization of a data cube is the ordering of the attribute values. For
large multidimensional arrays where dense and sparse chunks are stored
differently, proper normalization can lead to improved storage efficiency. We
show that it is NP-hard to compute an optimal normalization even for 1x3
chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are
nearly statistically independent, we show that dimension-wise attribute
frequency sorting is an optimal normalization and takes time O(d n log(n)) for
data cubes of size n^d. When dimensions are not independent, we propose and
evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is
already 19%-30% more efficient than ROLAP, but normalization can improve it
further by 9%-13% for a total gain of 29%-44% over ROLAP
Attribute Value Reordering for Efficient Hybrid OLAP
The normalization of a data cube is the process of choosing an ordering for the attribute values, and the chosen ordering will affect the physical storage of the cube's data. For large multidimensional arrays, proper normalization can lead to more efficient storage in hybrid OLAP contexts that store dense and sparse chunks differently. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When attributes are nearly statistically independent, we show that an optimal normalization is given by dimension-wise attribute frequency sorting, which can be done in time O(d n log(n)) for data cubes of size n^d. When attributes are not independent, we propose and evaluate a number of heuristics.\ud
\ud
Our optimized hybrid OLAP storage mechanism was observed to be 44% more storage efficient than ROLAP and the gains due to normalization alone accounted for 45% of this increase in efficiency
DROLAP - A Dense-Region Based Approach to On-Line Analytical Processing
ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) are two opposing techniques for building On-line Analytical Processing (OLAP) systems. MOLAP has good query performance while ROLAP is based on mature RDBMS technologies. Many data warehouses contain sparse but clustered multidimensional data which neither ROLAP or MOLAP handles efficiently and scalably.We propose a denseregion-based OLAP (DROLAP) approach which surpasses both ROLAP and MOLAP in space efficiency and query performance. DROLAP takes the bests of ROLAP and MOLAP and combines them to support fast queries and high storage utilization. The core of building a DROLAP system lies in the mining of dense regions in a data cube, for which we have developed an efficient index-based algorithm EDEM to handle. Extensive performance studies consistently show that the DROLAP approach is superior to both MOLAP and ROLAP in handling sparse but clustered multidimensional data. Moreover, our EDEM algorithm is efficient and effective in identifying dense regions.link_to_subscribed_fulltex