31,956 research outputs found

    Better bitmap performance with Roaring bitmaps

    Get PDF
    Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: WAH (Word Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable Integer Set). On synthetic and real data, we find that Roaring bitmaps (1) often compress significantly better (e.g., 2 times) and (2) are faster than the compressed alternatives (up to 900 times faster for intersections). Our results challenge the view that RLE-based bitmap compression is best

    CONCISE: Compressed 'n' Composable Integer Set

    Full text link
    Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. However, bitmaps usually use a large storage space, thus requiring compression. Nevertheless, there is a space-time tradeoff among compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades some space to allow for bitwise operations without first decompressing bitmaps. WAH has been recognized as the most efficient scheme in terms of computation time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set), a new scheme that enjoys significatively better performances than those of WAH. In particular, when compared to WAH, our algorithm is able to reduce the required memory up to 50%, by having similar or better performance in terms of computation time. Further, we show that CONCISE can be efficiently used to manipulate bitmaps representing sets of integral numbers in lieu of well-known data structures such as arrays, lists, hashtables, and self-balancing binary search trees. Extensive experiments over synthetic data show the effectiveness of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page

    Sparse signal and image recovery from Compressive Samples

    Get PDF
    In this paper we present an introduction to Compressive Sampling (CS), an emerging model-based framework for data acquisition and signal recovery based on the premise that a signal having a sparse representation in one basis can be reconstructed from a small number of measurements collected in a second basis that is incoherent with the first. Interestingly, a random noise-like basis will suffice for the measurement process. We will overview the basic CS theory, discuss efficient methods for signal reconstruction, and highlight applications in medical imaging

    Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction

    Full text link
    Discovering relevant, but possibly hidden, variables is a key step in constructing useful and predictive theories about the natural world. This brief note explains the connections between three approaches to this problem: the recently introduced information-bottleneck method, the computational mechanics approach to inferring optimal models, and Salmon's statistical relevance basis.Comment: 3 pages, no figures, submitted to PRE as a "brief report". Revision: added an acknowledgements section originally omitted by a LaTeX bu
    • …
    corecore