Search CORE

12 research outputs found

Attribute Value Reordering For Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP

arXiv.org e-Print Archive

CiteSeerX

R-libre

Archipel - Université du Québec à Montréal

Attribute Value Reordering for Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: ACM
Publication date: 01/01/2003
Field of study

The normalization of a data cube is the process of choosing an ordering for the attribute values, and the chosen ordering will affect the physical storage of the cube's data. For large multidimensional arrays, proper normalization can lead to more efficient storage in hybrid OLAP contexts that store dense and sparse chunks differently. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When attributes are nearly statistically independent, we show that an optimal normalization is given by dimension-wise attribute frequency sorting, which can be done in time O(d n log(n)) for data cubes of size n^d. When attributes are not independent, we propose and evaluate a number of heuristics.\ud \ud Our optimized hybrid OLAP storage mechanism was observed to be 44% more storage efficient than ROLAP and the gains due to normalization alone accounted for 45% of this increase in efficiency

CiteSeerX

R-libre

Crossref

NRC Publications Archive

Archipel - Université du Québec à Montréal

Better bitmap performance with Roaring bitmaps

Author: Beyer
Colantonio
Culpepper
Fusco
Inoue
Kaser
Kaser
Lemire
Lemire
Lemire
Warren
Publication venue: 'Wiley'
Publication date: 15/03/2016
Field of study

Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: WAH (Word Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable Integer Set). On synthetic and real data, we find that Roaring bitmaps (1) often compress significantly better (e.g., 2 times) and (2) are faster than the compressed alternatives (up to 900 times faster for intersections). Our results challenge the view that RLE-based bitmap compression is best

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

SPARSITY HANDLING AND DATA EXPLOSION IN OLAP SYSTEMS

Author: Kaloyanova Kalinka
Naydenova Ina
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/09/2010
Field of study

A common problem with OnLine Analytical Processing (OLAP) databases is data explosion - data size multiplies, when it is loaded from the source data into multidimensional cubes. Data explosion is not an issue for small databases, but can be serious problems with large databases. In this paper we discuss the sparsity and data explosion phenomenon in multidimensional data model, which lie at the core of OLAP systems. Our researches over five companies with different branch of business confirm the observations that in reality most of the cubes are extremely sparse. We also consider a different method that relational and multidimensional severs applies to reduce the data explosion and sparsity problems as compression and indexes techniques, partitioning, preliminary aggregations

AIS Electronic Library (AISeL)

Reordering Columns for Smaller Indexes

Author: Abadi
Alber
Anantha
Anh
Antoshenkov
Aouiche
Barnard
Bassiouni
Bhattacharjee
Cai
Chen
Daniel Lemire
Dehne
Eavis
Engene
Faloutsos
Fang
Flahive
Flahive
Garey
Golomb
Haddadi
Hamilton
Haverkort
Holloway
Holloway
Kamel
Kaser
Lemire
Lemke
Moffat
Moffat
Ng
Niedermeier
Owen Kaser
Peano
Pinar
Richards
Savage
Scholer
Vo
Witten
Wu
Zobel
Publication venue: 'Elsevier BV'
Publication date: 22/02/2011
Field of study

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.Comment: to appear in Information Science

arXiv.org e-Print Archive

R-libre

Crossref

Attribute value reordering for efficient hybrid OLAP

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref