20 research outputs found
Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes
Bitmap indexes must be compressed to reduce input/output costs and minimize
CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use
techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid
(WAH) compression. These techniques are sensitive to the order of the rows: a
simple lexicographical sort can divide the index size by 9 and make indexes
several times faster. We investigate reordering heuristics based on computed
attribute-value histograms. Simply permuting the columns of the table based on
these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200
An Analysis of netCDF-FastBit Integration and Primitive Spatial-Temporal Operations
A process allowing for the intuitive use of SQL queries on dense multidimensional data stored in Network Common Data Format (netCDF) files is developed using advanced bitmap indexing provided by the FastBit bitmap indexing tool. A method for netCDF data extraction and FastBit index creation is presented and a geospatial Range and pseudo-KNN search based on the haversine function is implemented via SQL. A two step filtering algorithm is shown to greatly enhance the speed of these geospatial queries, allowing for extremely efficient processing of the netCDF data in bitmap indexed form
Column Imprints: A Secondary Index Structure
Large scale data warehouses rely heavily on secondary indexes,
such as bitmaps and b-trees, to limit access to slow IO devices.
However, with the advent of large main memory systems, cache
conscious secondary indexes are needed to improve also the transfer
bandwidth between memory and cpu. In this paper, we introduce
column imprint, a simple but efficient cache conscious secondary
index. A column imprint is a collection of many small bit
vectors, each indexing the data points of a single cacheline. An
imprint is used during query evaluation to limit data access and
thus minimize memory traffic. The compression for imprints is
cpu friendly and exploits the empirical observation that data often
exhibits local clustering or partial ordering as a side-effect of the
construction process. Most importantly, column imprint compression
remains effective and robust even in the case of unclustered
data, while other state-of-the-art solutions fail. We conducted an
extensive experimental evaluation to assess the applicability and
the performance impact of the column imprints. The storage overhead,
when experimenting with real world datasets, is just a few
percent over the size of the columns being indexed. The evaluation
time for over 40000 range queries of varying selectivity revealed
the efficiency of the proposed index compar
Sorting improves word-aligned bitmap indexes
Bitmap indexes must be compressed to reduce input/output costs and minimize
CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use
techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid
(WAH) compression. These techniques are sensitive to the order of the rows: a
simple lexicographical sort can divide the index size by 9 and make indexes
several times faster. We investigate row-reordering heuristics. Simply
permuting the columns of the table can increase the sorting efficiency by 40%.
Secondary contributions include efficient algorithms to construct and aggregate
bitmaps. The effect of word length is also reviewed by constructing 16-bit,
32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are
slightly faster than 32-bit indexes despite being nearly twice as large
Recommended from our members
Breaking the Curse of Cardinality on Bitmap Indexes
Bitmap indexes are known to be efficient for ad-hoc range queries that are common in data warehousing and scientific applications. However, they suffer from the curse of cardinality, that is, their efficiency deteriorates as attribute cardinalities increase. A number of strategies have been proposed, but none of them addresses the problem adequately. In this paper, we propose a novel binned bitmap index that greatly reduces the cost to answer queries, and therefore breaks the curse of cardinality. The key idea is to augment the binned index with an Order-preserving Bin-based Clustering (OrBiC) structure. This data structure significantly reduces the I/O operations needed to resolve records that cannot be resolved with the bitmaps. To further improve the proposed index structure, we also present a strategy to create single-valued bins for frequent values. This strategy reduces index sizes and improves query processing speed. Overall, the binned indexes with OrBiC great improves the query processing speed, and are 3 - 25 times faster than the best available indexes for high-cardinality data