Search CORE

9 research outputs found

Tri de la table de faits et compression des index bitmaps avec alignement sur les mots

Author: Aouiche Kamel
Kaser Owen
Lemire Daniel
Publication venue
Publication date: 01/06/2008
Field of study

Bitmap indexes are frequently used to index multidimensional data. They rely mostly on sequential input/output. Bitmaps can be compressed to reduce input/output costs and minimize CPU usage. The most efficient compression techniques are based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH) compression. This type of compression accelerates logical operations (AND, OR) over the bitmaps. However, run-length encoding is sensitive to the order of the facts. Thus, we propose to sort the fact tables. We review lexicographic, Gray-code, and block-wise sorting. We found that a lexicographic sort improves compression--sometimes generating indexes twice as small--and make indexes several times faster. While sorting takes time, this is partially offset by the fact that it is faster to index a sorted table. Column order is significant: it is generally preferable to put the columns having more distinct values at the beginning. A block-wise sort is much less efficient than a full sort. Moreover, we found that Gray-code sorting is not better than lexicographic sorting when using word-aligned compression.Comment: to appear at BDA'0

arXiv.org e-Print Archive

R-libre

An efficient compression scheme for bitmap indices

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Recommended from our members

Performances of Multi-Level and Multi-Component Compressed BitmapIndices

Author: Shoshani Arie
Stockinger Kurt
Wu Kesheng
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 30/04/2007
Field of study

This paper presents a systematic study of two large subsetsof bitmap indexing methods that use multi-component and multi-levelencodings. Earlier studies on bitmap indexes are either empirical or foruncompressed versions only. Since most of bitmap indexes in use arecompressed, we set out to study the performance characteristics of thesecompressed indexes. To make the analyses manageable, we choose to use aparticularly simple, but efficient, compression method called theWord-Aligned Hybrid (WAH) code. Using this compression method, a numberof bitmap indexes are shown to be optimal because their worst-case timecomplexities for answering a query is a linear function of the number ofhits. Since compressed bitmap indexes behave drastically different fromuncompressed ones, our analyses also lead to a number of new methods thatare much more efficient than commonly used ones. As a validation for theanalyses, we implement a number of the best methods and measure theirperformance against well-known indexes. The fastest new methods arepredicted and observed to be 5 to 10 times faster than well-knownindexing methods

eScholarship - University of California

UNT Digital Library

Performances of Multi-Level and Multi-Component Compressed BitmapIndices

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Effiziente Laufzeitsysteme für Datenlager

Author: Westmann Till
Publication venue: Universität Mannheim
Publication date: 01/01/2000
Field of study

Aktuelle DBMS sind für OLTP-Anwendungen optimiert. Die Anforderungen von OLAP- und OLTP-Anwendungen an das DBMS unterscheiden sich erheblich. Wir habe einige dieser Unterschiede identifiziert und ein Laufzeitsystem entwickelt, das diese Unterschiede ausnutzt, um die Leistung für OLAP-Anwendungen zu verbessern. Die entwickelten Techniken beinhalten (1) die Verwendung einer virtuellen Maschine zur Auswertung von Ausdrücken, (2) die effiziente Integration von Kompression und (3) spezifische algebraische Operatoren. Unsere Evaluierung hat ergeben, daß die Verwendung dieser Techniken signifikante (Faktor 2 oder mehr) Leistungssteigerungen ermöglicht

MAnnheim DOCument Server

Parameterised Compression for Sparse Bitmaps

Author: Alistair Moffat
Justin Zobel
Publication venue: ACM Press
Publication date: 01/01/1992
Field of study

: Full-text retrieval systems typically use either a bitmap or an inverted file to identify which documents contain which words, so that the documents containing any combination of words can be quickly located. Bitmaps of word occurrences are large, but are usually sparse, and thus are amenable to a variety of compression techniques. Here we consider techniques in which the encoding of each bitvector within the bitmap is parameterised, so that a different code can be used for each bitvector. Our experimental results show that the new methods yield better compression than previous techniques. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: Data compaction and compression; H.3.2 [Information Storage]: File organisation . Keywords: Full-text retrieval, data compression, document database, Huffman coding, geometric distribution, inverted file. 1 Introduction Full-text retrieval systems are used for storing and accessing document collections such as newspaper a..

CiteSeerX

Crossref

Arquitectura de datos avanzada de un directorio web, con optimización de consultas restringidas a una zona del grafo de categorías

Author: Cacheda Fidel
Publication venue
Publication date: 01/01/2002
Field of study

[Resumen] Desde su origen, el World Wide Web ha sufrido un crecimiento exponencial que ha generado un gran volumen de información heterogénea accesible para cualquier usuario, Esto ha llevado a la utilización de herramientas eficientes para gestionar, recuperar y filtrar dicha información. En concreto, los directorios Web son taxonomías que clasifican documentos web, sobre los que posteriormente se realizarán consultas. Este tipo de sistemas de recuperación de información presenta un tipo específico de búsquedas, en donde la colección de documentos está restringida a una zona del grafo de categorías. Esta disertación presenta una arquitectura de datos específica para directorios Web que permite mejorar el rendimiento ante búsquedas restringidas. Dicha arquitectura se basa en una estructura de datos híbrida, constituida por un fichero invertido conteniendo embebido múltiples ficheros de firmas. En base al modelo propuesto se definen dos variantes: la arquitectura híbrida con información total y la arquitectura híbrida con información parcial. La valiez de esta arquitectura ha sido analizada mediante el desarrollo de ambas variantes para su comparación con un modelo básico, demostrando una clara mejoría en el rendimiento de las consultas restringidas, destacando especialmente el modelo híbrido con información parcial al responder adecuadamente bajo cualquier carga del sistema de búsqueda. A nivel general, la arquitectura propuesta se caracteriza por su facilidad de implementación, derivada de las estructuras de datos empleadas, su flexibilidad respecto al crecimiento del sistema y especialmente, por el buen rendimiento ofrecido ante búsquedas restringidas

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas