Search CORE

176 research outputs found

Attribute Value Reordering For Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP

arXiv.org e-Print Archive

CiteSeerX

R-libre

Archipel - Université du Québec à Montréal

Attribute Value Reordering for Efficient Hybrid OLAP

Author: Kaser Owen
Lemire Daniel
Publication venue: ACM
Publication date: 01/01/2003
Field of study

The normalization of a data cube is the process of choosing an ordering for the attribute values, and the chosen ordering will affect the physical storage of the cube's data. For large multidimensional arrays, proper normalization can lead to more efficient storage in hybrid OLAP contexts that store dense and sparse chunks differently. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When attributes are nearly statistically independent, we show that an optimal normalization is given by dimension-wise attribute frequency sorting, which can be done in time O(d n log(n)) for data cubes of size n^d. When attributes are not independent, we propose and evaluate a number of heuristics.\ud \ud Our optimized hybrid OLAP storage mechanism was observed to be 44% more storage efficient than ROLAP and the gains due to normalization alone accounted for 45% of this increase in efficiency

CiteSeerX

R-libre

Crossref

NRC Publications Archive

Archipel - Université du Québec à Montréal

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Author: Aouiche Kamel
Kaser Owen
Lemire Daniel
Publication venue
Publication date: 01/10/2008
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

arXiv.org e-Print Archive

R-libre

Better bitmap performance with Roaring bitmaps

Author: Beyer
Colantonio
Culpepper
Fusco
Inoue
Kaser
Kaser
Lemire
Lemire
Lemire
Warren
Publication venue: 'Wiley'
Publication date: 15/03/2016
Field of study

Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: WAH (Word Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable Integer Set). On synthetic and real data, we find that Roaring bitmaps (1) often compress significantly better (e.g., 2 times) and (2) are faster than the compressed alternatives (up to 900 times faster for intersections). Our results challenge the view that RLE-based bitmap compression is best

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

Sorting improves word-aligned bitmap indexes

Author: Bellatreche
Cai
Daniel Lemire
Davis
Ernvall
Goddyn
Graefe
Hammer
Jurgens
Kamel Aouiche
Knuth
Owen Kaser
Porter
Richards
Savage
Sharma
Sinha
Wu
Yiannis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

SPARSITY HANDLING AND DATA EXPLOSION IN OLAP SYSTEMS

Author: Kaloyanova Kalinka
Naydenova Ina
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/09/2010
Field of study

A common problem with OnLine Analytical Processing (OLAP) databases is data explosion - data size multiplies, when it is loaded from the source data into multidimensional cubes. Data explosion is not an issue for small databases, but can be serious problems with large databases. In this paper we discuss the sparsity and data explosion phenomenon in multidimensional data model, which lie at the core of OLAP systems. Our researches over five companies with different branch of business confirm the observations that in reality most of the cubes are extremely sparse. We also consider a different method that relational and multidimensional severs applies to reduce the data explosion and sparsity problems as compression and indexes techniques, partitioning, preliminary aggregations

AIS Electronic Library (AISeL)

Concepts and Techniques for Flexible and Effective Music Data Management

Author: Deliege Francois
Publication venue: Department of Computer Science, Aalborg University
Publication date: 25/09/2009
Field of study

VBN

Exploiting Data Skew for Improved Query Performance

Author: Ross Kenneth A.
Zhang Wangda
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2019
Field of study

Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude

arXiv.org e-Print Archive

Crossref

Growth of relational model: Interdependence and complementary to big data

Author: Prabhu Srikanth
Rao B. Dinesh
Shetty Sucharitha
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/04/2021
Field of study

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Reordering Columns for Smaller Indexes

Author: Abadi
Alber
Anantha
Anh
Antoshenkov
Aouiche
Barnard
Bassiouni
Bhattacharjee
Cai
Chen
Daniel Lemire
Dehne
Eavis
Engene
Faloutsos
Fang
Flahive
Flahive
Garey
Golomb
Haddadi
Hamilton
Haverkort
Holloway
Holloway
Kamel
Kaser
Lemire
Lemke
Moffat
Moffat
Ng
Niedermeier
Owen Kaser
Peano
Pinar
Richards
Savage
Scholer
Vo
Witten
Wu
Zobel
Publication venue: 'Elsevier BV'
Publication date: 22/02/2011
Field of study

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.Comment: to appear in Information Science

arXiv.org e-Print Archive

R-libre

Crossref