Search CORE

20 research outputs found

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Author: Aouiche Kamel
Kaser Owen
Lemire Daniel
Publication venue
Publication date: 01/10/2008
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

arXiv.org e-Print Archive

R-libre

An Analysis of netCDF-FastBit Integration and Primitive Spatial-Temporal Operations

Author: Marks David
Publication venue: ScholarWorks@UNO
Publication date: 06/08/2009
Field of study

A process allowing for the intuitive use of SQL queries on dense multidimensional data stored in Network Common Data Format (netCDF) files is developed using advanced bitmap indexing provided by the FastBit bitmap indexing tool. A method for netCDF data extraction and FastBit index creation is presented and a geospatial Range and pseudo-KNN search based on the haversine function is implemented via SQL. A two step filtering algorithm is shown to greatly enhance the speed of these geospatial queries, allowing for extremely efficient processing of the netCDF data in bitmap indexed form

University of New Orleans

Column Imprints: A Secondary Index Structure

Author: Kersten M.L. (Martin)
Sidirourgos E. (Eleftherios)
Publication venue
Publication date: 01/06/2013
Field of study

Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compar

CWI's Institutional Repository

Sorting improves word-aligned bitmap indexes

Author: Bellatreche
Cai
Daniel Lemire
Davis
Ernvall
Goddyn
Graefe
Hammer
Jurgens
Kamel Aouiche
Knuth
Owen Kaser
Porter
Richards
Savage
Sharma
Sinha
Wu
Yiannis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

An efficient compression scheme for bitmap indices

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Recommended from our members

Breaking the Curse of Cardinality on Bitmap Indexes

Author: Shoshani Arie
Stockinger Kurt
Wu Kesheng
Wu Kesheng
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 04/04/2008
Field of study

Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/O operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data

UNT Digital Library

Using Bitmap Indexing Technology for Combined Numerical and TextQueries

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref