Search CORE

11 research outputs found

Reducing Database Bitmap Index Size Using Variable Length Compression

Author: Corrales Fabian J.
Publication venue: Sound Ideas
Publication date: 01/01/2010
Field of study

AVRF: A Framework to Enable Distributed Computing Using Volunteered Mobile Resources

Author: Arnold Evan
Publication venue: Sound Ideas
Publication date: 01/01/2011
Field of study

The goal of this project was to create a framework to enable Android mobile devices to act as volunteer workers for distributed computing tasks. The use of volunteered resources has enabled projects with high computational requirements to gain access to significant computational power using the idle time of personal computers volunteered by their owners, such as in SETI@home or similar. The Android Volunteered Resource Framework (AVRF) attempts to implement a similar system, albeit with Android powered devices as the volunteered workers. The framework downloads a set of components which define how to get, process, and return data, and runs them in the background of a more user interface intensive application with the goal of reducing or masking battery drain. Although the battery drain remains considerable it is possible that a user may find the drain acceptable anyway, or that running AVRF might be more useful when the phone is plugged in and charging

Sound Ideas

Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes

Author: Aouiche Kamel
Kaser Owen
Lemire Daniel
Publication venue
Publication date: 01/10/2008
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate reordering heuristics based on computed attribute-value histograms. Simply permuting the columns of the table based on these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200

arXiv.org e-Print Archive

R-libre

Recommended from our members

Optimizing Frequency Queries for Data Mining Applications

Author: Kender John R.
Malik Hassan H.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

Data mining algorithms use various Trie and bitmap-based representations to optimize the support (i.e., frequency) counting performance. In this paper, we compare the memory requirements and support counting performance of FP Tree, and Compressed Patricia Trie against several novel variants of vertical bit vectors. First, borrowing ideas from the VLDB domain, we compress vertical bit vectors using WAH encoding. Second, we evaluate the Gray code rank-based transaction reordering scheme, and show that in practice, simple lexicographic ordering, obtained by applying LSB Radix sort, outperforms this scheme. Led by these results, we propose HDO, a novel Hamming-distance-based greedy transaction reordering scheme, and aHDO, a linear-time approximation to HDO. We present results of experiments performed on 15 common datasets with varying degrees of sparseness, and show that HDO- reordered, WAH encoded bit vectors can take as little as 5% of the uncompressed space, while aHDO achieves similar compression on sparse datasets. Finally, with results from over a billion database and data mining style frequency query executions, we show that bitmap-based approaches result in up to hundreds of times faster support counting, and HDO-WAH encoded bitmaps offer the best space-time tradeoff

Columbia University Academic Commons

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Author: Gutarra Eduardo
Kaser Owen
Lemire Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2012
Field of study

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

arXiv.org e-Print Archive

R-libre

Crossref

FITing-Tree: A Data-aware Index Structure

Author: Binnig Carsten
Fonseca Rodrigo
Galakatos Alex
Kraska Tim
Markovitch Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a modern DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space available to store new data or process existing data. In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps determine an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.Comment: 18 page

arXiv.org e-Print Archive

TUbiblio

Crossref

DSpace@MIT

Dynamic Data Organization for Bitmap Indices

Author
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 01/01/2008
Field of study

Crossref

Sorting improves word-aligned bitmap indexes

Author: Bellatreche
Cai
Daniel Lemire
Davis
Ernvall
Goddyn
Graefe
Hammer
Jurgens
Kamel Aouiche
Knuth
Owen Kaser
Porter
Richards
Savage
Sharma
Sinha
Wu
Yiannis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. These techniques are sensitive to the order of the rows: a simple lexicographical sort can divide the index size by 9 and make indexes several times faster. We investigate row-reordering heuristics. Simply permuting the columns of the table can increase the sorting efficiency by 40%. Secondary contributions include efficient algorithms to construct and aggregate bitmaps. The effect of word length is also reviewed by constructing 16-bit, 32-bit and 64-bit indexes. Using 64-bit CPUs, we find that 64-bit indexes are slightly faster than 32-bit indexes despite being nearly twice as large

arXiv.org e-Print Archive

CiteSeerX

R-libre

Crossref

Tree-Encoded Bitmaps

Author: Athanassoulis Manos
Chan Chee Yong
Chan Chee Yong
Clark David R.
Francc
González Rodrigo
Jacobson Guy
Johnson Theodore
Li Yinan
MacNicol Roger
Moerkotte Guido
O'Neil Patrick E.
O'Neil Patrick E.
Polychroniou Orestis
Rinfret Denis
Vigna Sebastiano
Wang Bo
Wu Kesheng
Wu Kun-Lung
Wu Ming-Chuan
Yu Jia
Zhou Dong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2020
Field of study

We propose a novel method to represent compressed bitmaps. Similarly to existing bitmap compression schemes, we exploit the compression potential of bitmaps populated with consecutive identical bits, i.e., 0-runs and 1-runs. But in contrast to prior work, our approach employs a binary tree structure to represent runs of various lengths. Leaf nodes in the upper tree levels thereby represent longer runs, and vice versa. The tree-based representation results in high compression ratios and enables efficient random access, which in turn allows for the fast intersection of bitmaps. Our experimental analysis with randomly generated bitmaps shows that our approach significantly improves over state-of-the-art compression techniques when bitmaps are dense and/or only barely clustered. Further, we evaluate our approach with real-world data sets, showing that our tree-encoded bitmaps can save up to one third of the space over existing techniques

Crossref

CWI's Institutional Repository

Compressing Bitmap Indices by Data Reorganization

Author: Ali Pınar
Hakan Ferhatosmanoglu
Tao Tao
Publication venue
Publication date: 01/07/2004
Field of study

Many scientific applications generate massive volumes of data through observations or computer simulations, bringing up the need for effective indexing methods for efficient storage and retrieval of scientific data. Unlike conventional databases, scientific data is mostly read-only and its volume can reach to the order of petabytes, making a compact index structure vital. Bitmap indexing has been successfully applied to scientific databases by exploiting the fact that scientific data are enumerated or numerical. Bitmap indices can be compressed with variants of run length encoding for a compact index structure. However even this may not be enough for the enormous data generated in some applications such as high energy physics. In this paper, we study how to reorganize bitmap tables for improved compression rates. Our algorithms are used just as a preprocessing step, thus there is no need to revise the current indexing techniques and the query processing algorithms. We introduce the tuple reordering problem, which aims to reorganize database tuples for optimal compression rates. We propose Gray code ordering algorithm for this NP-Complete problem, which is an inplace algorithm, and runs in linear time in the order of the size of the database. We also discuss how the tuple reordering problem can be reduced to the traveling salesperson problem. Our experimental results on real data sets show that the compression ratio can b

CiteSeerX

eScholarship - University of California

UNT Digital Library