123 research outputs found

    Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

    Get PDF
    Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

    Bitmap indexing a suitable approach for data warehouse design

    Get PDF
    Data warehouse is a collection of huge database which is subject oriented, integrated, time-variant and non volatile. As it is a set of huge database, fast data access is the major performance parameter of any data warehouse. Generally the information retrieved from Data Warehouse is summarized or aggregated as it is required for some decision making process of organization. To retrieve such a information queries to be fired is of the nature aggregation function followed by having clause. Extracting information efficiently from data warehouse is the challenge in front of researchers. As it is a huge database time required to access information is more compare to normal databases. Due to this creating index on this huge database is essential and it is important for increasing the performance of data warehouse .Selection of appropriate indexing decreases the query execution time and the performance of data warehouse is increase. Presently B-tree indexing is used in different database products. For creating the index on any table they uses B Tree approach. B tree indexing is useful for the databases where the frequent updates are required like On Line Transaction Processing system(OLTP). It is a time consuming approach for data warehouse and On Line Analytical System(OLAP). Data warehouse is not frequently updated so Bitmap indexing is appropriate choice for the same. We have to create bitmap index on required vector at the start only. Once it is created on fixed database we can use it any time for any query. As per the requirement of query we have to select bitmap and execute query. The bitmap indexing is appropriate choice for Data warehouse only because of its feature like it is non volatile and huge data set. DOI: 10.17762/ijritcc2321-8169.15025

    The FZ Strategy to Compress the Bitmap Index for Data Warehouses

    Get PDF
    Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. For the data with high cardinality in data warehouses, a bitmap index consists of a lot of bitmap vectors, and the size of the bitmap index could be much larger than the capacity of the disk. The WAH strategy has been presented to solve the storage overhead. However, when the bit density and clustering factor of 1\u27s increase, the bit strings of the WAH strategy become less compressible. Therefore, in this paper, we propose the FZ strategy which compresses each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. From our performance simulation, the FZ strategy could reduce the storage space more than the WAH strategy

    Towards Optimal Multi-Dimensional Query Processing with BitmapIndices

    Full text link

    Column Imprints: A Secondary Index Structure

    Get PDF
    Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compar

    Performance comparison of property map and bitmap indexing

    Full text link
    A data warehouse is a collection of data from different sources that supports analytical querying. A Bitmap Index (BI) allows fast access to individual attribute values that are needed to answer a query by representing the values of an attribute for all tuples separately, as bit strings. A Property Map (PMap) is a multidimensional indexing technique that pre-computes attribute expressions, called properties, for each tuple and stores the results as bit strings [DD97, LD02]. This paper compares the performance of the PMap and the Range-Encoded Bit-Sliced Index (REBSI) [CI98] using cost models to simulate their storage and query processing costs for different kinds of queries over a benchmark schema. We identify parameters that affect performance of these indexes and determine situations in which either technique gives significant improvement over the other. We also explore ways to improve PMap design to enhance performance
    • …
    corecore