Search CORE

123 research outputs found

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Author: Aouiche Kamel
Kaser Owen
Lemire Daniel
Publication venue
Publication date: 01/10/2008
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

arXiv.org e-Print Archive

R-libre

Bitmap indexing a suitable approach for data warehouse design

Author: Kale Sarika Prakash, P.M. Joe Prathap
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2015
Field of study

Data warehouse is a collection of huge database which is subject oriented, integrated, time-variant and non volatile. As it is a set of huge database, fast data access is the major performance parameter of any data warehouse. Generally the information retrieved from Data Warehouse is summarized or aggregated as it is required for some decision making process of organization. To retrieve such a information queries to be fired is of the nature aggregation function followed by having clause. Extracting information efficiently from data warehouse is the challenge in front of researchers. As it is a huge database time required to access information is more compare to normal databases. Due to this creating index on this huge database is essential and it is important for increasing the performance of data warehouse .Selection of appropriate indexing decreases the query execution time and the performance of data warehouse is increase. Presently B-tree indexing is used in different database products. For creating the index on any table they uses B Tree approach. B tree indexing is useful for the databases where the frequent updates are required like On Line Transaction Processing system(OLTP). It is a time consuming approach for data warehouse and On Line Analytical System(OLAP). Data warehouse is not frequently updated so Bitmap indexing is appropriate choice for the same. We have to create bitmap index on required vector at the start only. Once it is created on fixed database we can use it any time for any query. As per the requirement of query we have to select bitmap and execute query. The bitmap indexing is appropriate choice for Data warehouse only because of its feature like it is non volatile and huge data set. DOI: 10.17762/ijritcc2321-8169.15025

International Journal on Recent and Innovation Trends in Computing and Communication

Recommended from our members

Partitioned Blockmap Indexes for Multidimensional Data Access

Author: Ross Kenneth A.
Sitaridi Evangelia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term "blockmap." The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead

Columbia University Academic Commons

The FZ Strategy to Compress the Bitmap Index for Data Warehouses

Author: Chang Ye-In
Chen Hue-Ling
Lin Chien-Show
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2005
Field of study

Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. For the data with high cardinality in data warehouses, a bitmap index consists of a lot of bitmap vectors, and the size of the bitmap index could be much larger than the capacity of the disk. The WAH strategy has been presented to solve the storage overhead. However, when the bit density and clustering factor of 1\u27s increase, the bit strings of the WAH strategy become less compressible. Therefore, in this paper, we propose the FZ strategy which compresses each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. From our performance simulation, the FZ strategy could reduce the storage space more than the WAH strategy

AIS Electronic Library (AISeL)

Towards Optimal Multi-Dimensional Query Processing with BitmapIndices

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Column Imprints: A Secondary Index Structure

Author: Kersten M.L. (Martin)
Sidirourgos E. (Eleftherios)
Publication venue
Publication date: 01/06/2013
Field of study

Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compar

CWI's Institutional Repository

Performance comparison of property map and bitmap indexing

Author: Ashima Gupta
Jennifer Grommon-litton
Karen C. Davis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2002
Field of study

A data warehouse is a collection of data from different sources that supports analytical querying. A Bitmap Index (BI) allows fast access to individual attribute values that are needed to answer a query by representing the values of an attribute for all tuples separately, as bit strings. A Property Map (PMap) is a multidimensional indexing technique that pre-computes attribute expressions, called properties, for each tuple and stores the results as bit strings [DD97, LD02]. This paper compares the performance of the PMap and the Range-Encoded Bit-Sliced Index (REBSI) [CI98] using cost models to simulate their storage and query processing costs for different kinds of queries over a benchmark schema. We identify parameters that affect performance of these indexes and determine situations in which either technique gives significant improvement over the other. We also explore ways to improve PMap design to enhance performance

CiteSeerX

Crossref